<#320#>25 October 1992<#320#>
<#324#>Intro<#324#><#325#>Introduction<#325#>
Numerous definitions of the term <#326#><#326#> abound these days. However, they all share one common thread: a successful system <#327#>convinces<#327#> you that you are somewhere other than where you really are. Despite engineers' wishes to the contrary, just <#328#>how<#328#> convincing this experience is seems to depend only weakly on raw technical statistics like megabits per second; rather, more important is how well this information is ``matched'' to the expectations of our physical senses. While our uses for might encompass virtual worlds bearing little resemblance to the real world, our physical senses nevertheless still expect to be stimulated in the same ``natural'' ways that they have for millions of years.
Two particular areas of concern for designers of current systems are the <#329#>latency<#329#> and <#330#>update rate<#330#> of the visual display hardware employed. Experience has shown that poor performance in either of these two areas can quickly destroy the ``realness'' of a session, even if all of the remaining technical specifications of the hardware are impeccable. This is perhaps not completely surprising, given that the images of objects that humans interact with in the natural world are never delayed by more than fractions of milliseconds, nor are they ever ``sliced'' into discrete time frames (excluding the intervention of the technology of the past hundred years). On the other hand, the relative psychological importance of latency and update rate can, conversely, be used to great advantage: with suitably good performance on these two fronts, the ``convinceability factor'' of a system can withstand relatively harsh degradation of its other capabilities (such as image resolution). ``If it <#331#>moves<#331#> like a man-eating beast, it probably <#332#>is<#332#> a man-eating beast---even if I didn't see it clearly'' is no doubt a hard-wired feature of our internal image processing subsystem that is responsible for us still being on the planet today.
A major problem facing the system designer, however, is that presenting a sufficiently ``smooth'' display to fully convince the viewer of ``natural motion'' often seems to require an unjustifiably high computational cost. As an order-of-magnitude estimate, a rasterised display update rate of around 100 updates per second is generally sufficiently fast enough that the human visual system cannot distinguish it from continuous vision. But the actual amount of information gleaned from these images by the viewer in one second is nowhere near the amount of information containable in 100 static images---as can be simply verified by watching a ``video montage'' of still photographs presented rapidly in succession. The true rate of ``absorption'' of detailed visual information is probably closer to 10 updates per second---or worse, depending on how much actual detail is taken as a benchmark. Thus, providing a completely ``smooth'' display takes roughly an order of magnitude more effort than is ultimately appreciated---not unlike preparing a magnificent dinner for twelve and then having else turn up.
For this reason, many designers make an educated compromise between motion smoothness and image sophistication, by choosing an update rate that is somewhere between the ∼10 updates per second rate that we absorb information at, and the ∼100 updates per second rate needed for smooth apparent motion. Choosing a rate closer to 10 updates per second requires that the participant mentally ``interpolate'' between the images presented---not difficult, but nevertheless requiring some conscious processing, which seems to leave fewer ``brain cycles'' for appreciating the virtual-world experience. On the other hand, choosing a rate closer to 100 updates per second results in natural-looking motion, but the reduced time available for each update reduces the sophistication of the graphics---fewer ``polygons per update''. The best compromise between these two extremes depends on the application in question, the expectations of the participants, and, probably most importantly, the opinions of the designer.
In the remaining sections of this , we outline enhancements to current rasterised display technology that permit the motion of objects to be consistently displayed at a high update rate, while allowing the image generation subsystem to run at a lower update rate, and hence provide more sophisticated images. Section~<#333#>BasicPhilosophy<#333#> presents an overview of the issues that are to be addressed, and outlines the general reasoning behind the approach that is taken. Following this, in section~<#334#>MinimalImplementation<#334#>, detailed (but platform-independent) information is provided that would allow a designer to ``retrofit'' the techniques outlined in section~<#335#>BasicPhilosophy<#335#> to an existing system. For these purposes, as much of the current image generation design philosophy as possible is retained, and only those minimal changes required to implement the techniques immediately are described. However, it will be shown that the full benefit of the methods described in this , in terms of the specific needs of , will most fruitfully be obtained by subtly changing the way in which the image generation process is currently structured. These changes, and more advanced topics not addressed in section~<#336#>MinimalImplementation<#336#>, are considered in section~<#337#>Enhancements<#337#>.
<#338#>BasicPhilosophy<#338#><#339#>The Basic Philosophy<#339#> We begin, in section~<#340#>CurrentRasters<#340#>, by reviewing the general methods by which current rasterised displays are implemented, to appreciate more fully why the problems outlined in section~<#341#>Intro<#341#> are present in the first place, and to standardise the terminology that will be used in later sections. Following this, in section~<#342#>Motion<#342#>, we review some of the fundamental physical principles underlying our understanding of motion, to yield further insight into the problems of depicting it accurately on display devices. These deliberations are used in section~<#343#>GalAnti<#343#> to pinpoint the shortcomings of current rasterised display technology, and to formulate a general plan of attack to rectify these problems. A brief introduction to the terminology used for the general structures required to carry out these techniques is given in section~<#344#>Galpixels<#344#>---followed, in section~<#345#>GalpixmapStructure<#345#>, by a careful consideration of the level of sophistication required for practical yet reliable systems. Specific details about the hardware and software modifications required to implement the methods outlined are deferred to sections~<#346#>MinimalImplementation<#346#> and~<#347#>Enhancements<#347#>.
<#348#>CurrentRasters<#348#><#349#>Overview of Current Rasterised Displays<#349#> Rasterised display devices for computer applications, while ubiquitous in recent years, are relatively new devices. Replacing the former <#350#>vector display<#350#> technology, they took advantage of the increasingly powerful memory devices that became available in the early 1970s, to represent the display as a digital matrix of pixels, a <#351#>raster<#351#> or <#352#>frame buffer<#352#>, which was scanned out to the CRT line-by-line in the same fashion as the by then well-proven technology of <#353#>television<#353#>.
The electronic circuitry responsible for scanning the image out from the frame buffer to the CRT (or, these days, to whatever display device is being used) is referred to as the <#354#>video controller<#354#>---which may be as simple as a few interconnected electronic devices, or as complex as a sophisticated microprocessor. The <#355#>refresh rate<#355#> may be defined as the reciprocal of the time required for the video controller to refresh the display from the frame buffer, and is typically in the range 25--120~Hz, to both avoid visible flicker, and to allow smooth motion to be depicted. (Interlacing is required for CRTs at the lower end of this range to avoid visible flicker; for simplicity, we assume that the display is <#356#>non-interlaced<#356#> with a suitably high refresh rate.) Each complete image, as copied by the video controller from the frame buffer to the physical display device, is referred to as a <#357#>frame<#357#>; this term can also be used to refer to the time interval between frame refreshes, the reciprocal of the refresh rate.
Most systems employ two display devices to present a stereoscopic view to the participant, one display for each eye. Physically, this may be implemented as two completely separate display subsystems, with their own physical output devices (such as a twin-LCD head-mounted display). Alternatively, hardware costs may be reduced by interleaving the two video signals into a single physical output device, and relying on physically simpler (and sometimes less ``face-sucking'') demultiplexer techniques to separate the signals, such as time-domain switching ( electronic shutter glasses), colour-encoding ( the ``3-D'' coloured glasses of the 1950s), optical polarisation (for which we are fortunate that the photon is a vector boson, and that we have only two eyes), or indeed any other physical attribute capable of distinguishing two multiplexed optical signals. For the purposes of this , however, we define a <#358#>logical display device<#358#> to be <#359#>one<#359#> of the video channels in a twin-display device (say, the left one), or else the corresponding <#360#>effective<#360#> monoscopic display device for a multiplexed system. For example, a system employing a single (physical) 120-Hz refresh-rate CRT, time-domain-multiplexing the left and right video channels into alternate frames, is considered to possess two <#361#>logical<#361#> display devices, each running at a 60~Hz refresh rate. In general, we ignore completely the engineering problems (in particular, <#362#>cross-talk<#362#>) that multiplexed systems must contend with. Indeed, for most of this , we shall ignore the stereoscopic nature of displays altogether, and treat each video channel separately; therefore, all references to the term ``display device'' in the following sections refer to <#363#>one<#363#> of the two logical display devices, with the understanding that the other video channel simply requires duplicating the hardware and software of the first.
We have described above how the video controller scans frames out from the frame buffer to the physical display device. The frame buffer, in turn, receives its information from the <#364#>display processor<#364#> (or, in simple systems, the CPU itself---which we shall also refer to as ``the display processor'' when acting in this ). Data for each pixel in the frame buffer is retained unchanged from one frame to the next, unless it is overwritten by the display processor in the intervening time. There are many applications for computer graphics for which this ``sample-and-hold'' nature of the frame buffer is very useful: a background scene can be painted into the frame buffer once, and only those objects that change their position or shape from frame to frame need be redrawn (together with repairs to the background area thus uncovered). This technique is often well-suited to traditional computer hardware environments---namely, those in which the display device is physically fixed in position on a desk or display stand---because a constant background view accords well with the view that we are using the display as a (static) ``window'' on a virtual world. However, this technique is, in general, ill-suited to environments, in which the display is either affixed to, or at least in some way ``tracks'', the viewer, and thus must change even the ``background'' information constantly as the viewer moves her head.
There are several problems raised by this requirement of constant updating of the entire frame buffer. Firstly, if the display processor proceeds to write into the frame buffer at the same time as the video controller is refreshing the display, the result will often be an excessive amount of visible flicker, as partially-drawn (and, indeed, partially-erased) objects are ``caught with their pants down'' during the refresh. Secondly, once the proportion of the image requiring regular updating rises to any significant fraction, it becomes more computationally cost-effective to simply erase and redraw the entire display than to erase the individual objects that need redrawing. Unless the new scene can be redrawn in significantly less time than a single frame (untrue in any but the most trivial situations), the viewer will see not a succession of complete views of a scene, but rather a succession of scene-building drawing operations. This is often acceptable for ``non-immersive'' applications such as CAD (in which this ``building process'' can indeed often be most informative); it is not acceptable, however, for any convincing ``immersive'' application such as .
The standard solution to this problem is <#365#>double buffering<#365#>. The video controller reads its display information from one frame buffer, whilst at the same time a second frame buffer is written to by the display processor. When the display processor has finished rendering a complete image, the video controller is instructed to switch to the second frame buffer (containing the new image), simultaneously switching the display processor's focus to the first frame buffer (containing the now obsolete image that the video controller was formerly displaying). With this technique, the viewer sees one constant image for a number of frames, until the new image has been completed. At that time, the view is instantaneously switched to the new image, which remains in view until yet another complete image is available. Each new, completed image is referred to as an <#366#>update<#366#>, and the rate at which these updates are forthcoming from the display processor is the <#367#>update rate<#367#>.
It is important to note the difference between <#368#>refresh<#368#> rate and <#369#>update<#369#> rate, and the often-subtle physical interplay between the two. The refresh rate is the rate at which the video controller reads images from the frame buffer to the display device, and is typically a constant for a given hardware configuration ( 70 Hz). The update rate, on the other hand, is the rate at which complete new images of the scene in question are rendered; it is generally lower than the refresh rate, and usually depends to a greater or lesser extent on the complexity of the image being generated by the display processor.
It is often preferable to change the video processor's focus only <#370#>between<#370#> frame refreshes to the physical display device---and not mid-frame---especially if the display processor is comparable in speed to one update per frame. This is because switching the video controller's focus mid-frame ``chops'' those objects in the particular scan line that is being scanned out by the video controller at the time, which leads to visible discontinuities in the perceived image. On the other hand, this restriction means that the display processor must wait until the end of a frame to begin drawing a new image (unless a third frame buffer is employed), effectively forcing the refresh-to-update rate ratio up to the next highest integral value. This is most detrimental when the update rate is already high; for example, if an update takes only 1.2 refresh periods to be drawn, ``synchronisation'' with the frame buffer means that the remaining 0.8 of a refresh period is unusable for drawing operations.
Regardless of whether the frame-switching circuitry is synchronised to the frame rate or not, if the update rate of the display processor is in fact slower than the refresh rate (the usual case), then the same static image persists on the display device for a number of frames, until a new image is ready for display. For <#371#>static<#371#> objects on the display, this ``sample-and-hold'' technique is ideal: the image's motion ( no motion at all!) is correctly depicted at the (high) refresh rate, even though the image itself is only being generated at the (lower) update rate. This phenomenon, while appearing quite trivial in today's rasterised-display world, is in fact a major advance over the earlier vector-display technology: the video processor, utilising the frame buffer, effectively <#372#>fills in the information gaps<#372#> between the images supplied by the display processor. Recognition of the remarkable power afforded by this feat of ``interpolation''---and, more importantly, a critical assessement of how this ``interpolation'' is currently carried out---is critical to appreciating the modifications that will be suggested shortly.
As mentioned in section~<#373#>Intro<#373#>, the <#374#>latency<#374#> (or ``time lag'') of a system in general, and the display system in particular, is crucial for the experience to be convincing (and, indeed, non-nauseous). There are many potential and actual sources of latency in such systems; in this , we are concerned only with those introduced by the image generation and display procedures themselves. Already, the above description of a double-buffered display system contains a number of potential sources of lag. Firstly, if the display processor computes the apparent positions of the objects in the image based on positional information valid at the <#375#>start<#375#> of its computations, these apparent positions will already be slightly out of date by the time the computations are complete. Secondly, the rendering and scan-conversion of the objects takes more time, and is based on the (already slightly outdated) positional information. Finally---and perhaps most subtly---the very ``sample-and-hold'' nature of the video processor's frame buffer leads to a significant average time lag itself, equal to <#376#>half the update period<#376#>. While a general mathematical proof of this figure is not difficult, a ``hand-waving'' argument is easily constructed. For simplicity, assume that all other lags in the graphical pipeline are magically removed, so that, upon the first refresh of a new update, it describes the virtual environment at that point in time accurately. By the time of the second refresh of the same image, it is now one frame out-of-date; by the third refresh, it is two frames out-of-date; and likewise for all remaining refreshes of the same image until a new update is provided. By the ``hand-waving'' argument of simply averaging the out-of-datedness of each refresh across the entire update period, one obtains
It is this undesirable feature of conventional display methodology that we will aim to remove in this . However, to provide suitable background for the approach we shall take, and to put our later specifications into context, we first review some quite general considerations on the nature of physical motion.
<#385#>Motion<#385#><#386#>The Physics of Motion<#386#> As noted in section~<#387#>Intro<#387#>, while our applications for technology may encompass virtual worlds far removed from the laws of physics, our physical senses nevertheless expect to be stimulated more or less in the same way that they are in the real world. It is therefore useful to review briefly the evolution of man's knowledge about the fundamental nature of motion, and note how well these views have or have not been incorporated into real-time computer graphics.
Some of the earliest questions about the nature of motion that have survived to this day are due to Zeno of Elea. His most <#388#>famous<#388#> paradox---that of Achilles and Tortoise---is amusing to this day, but is nevertheless more a question of mathematics than physics. More interesting is his paradox of the Moving Arrow: At any instant in time, an arrow occupies a certain position. At the next instant of time, the arrow has moved forward somewhat. His question, somewhat paraphrased, was: How does the arrow know how to get to this new position by the very next instant? It cannot be ``moving'' at the first instant in time, because an instant has no duration---and motion cannot be measured except over some duration.
Let us leave aside, for the moment, the flaws that can be so quickly pointed out in this argument by anyone versed in modern physics. Consider, instead, what Zeno would say to us if we travelled back in time in our Acme Time Travel Machine, and showed him a television receiver displaying a broadcast of an archery tournament. (Ignore the fact that, had television programmes been in existence two and a half thousand years ago, Science as we know it would probably not exist.) Zeno would no doubt be fascinated to find that the arrows that moved so realistically across the screen were, in fact, a <#389#>series of static images<#389#> provided in rapid succession---in full agreement (or so he would think) with his ideas on the nature of motion. The question that would then spring immediately to his lips: <#390#>How does the television know how to move the objects on the screen?<#390#>
Our response would, no doubt, be that the television <#391#>doesn't<#391#> know how to move the objects; it simply waits for the next frame (from the broadcasting station) which shows the objects in their new positions. Zeno's follow-up: How does the <#392#>broadcasting station<#392#> know how to move them? Answer: It doesn't either; it just sends whatever images the video camera measures. And eventually we return to Zeno's original question: How does the real arrow itself ``know'' how to move? Ah, well, that's a question that even television cannot answer.
Ignoring for the moment the somewhat ridiculous nature of this hypothetical exchange, consider Virtual Zeno's first question from first principles. Why <#393#>can't<#393#> the television move the objects by itself? Surely, if the real arrow somehow knows how to move, then it is not unreasonable that the television might obtain this knowledge too. The only task then, is to determine this information, and tell it to the television! Of course, this is a little simplistic, but let us fast-forward our time machine a little and see what answers we obtain.
Our next visit would most likely be to Aristotle. Asking him about Zeno's arrow paradox would yield his well-known answer---that would, in fact, be regarded as the ``right answer'' for the next 2300 years: Zeno wrongly assumes that indivisible ``instants of time'' exist at all. Granting Aristotle this explanation of Zeno's mistake, what would his opinions be regarding ``teaching'' the television how to move the objects on its own? His response, no doubt, would be to explain that every object has its <#394#>natural place<#394#>, and that its <#395#>natural motion<#395#> is such that it moves towards its natural place, thereafter remaining at rest (unless subsequently subjected to <#396#>violent motions<#396#>). Heartened by this news, we ask him for a mathematical formula for this natural motion, so that we can teach it to our television. ``Ah, well, I don't think too much of mathematical formulæ,'' he professes, engrossed in a re-run of <#397#>I Love Lucy<#397#>, ``although I can tell you that heavier bodies fall faster than light ones.'' So much for an Aristotelian solution to our problem.
Undaunted, we tweak our time machine forward somewhat---2000 years, in
fact.
Here, we find the ageing Galileo Galilei ready and willing to answer our
questions.
On asking about Zeno's arrow paradox, we find a general agreement with
Aristotle's explanation of Zeno's error.
On the other hand, on enquiring how a television might be taught how
to move objects on its own, we obtain these simple answers:
If the body is in <#398#>uniform motion<#398#>, it moves according to x = x0 + vt; if
it is <#399#>uniformly accelerated<#399#>, it moves according to
The tale woven in this section is, admittedly, a little fanciful, but nevertheless illustrates most clearly the thinking behind the methods to be expounded. Very intriguing, but omitted from this account, is the fact that Aristotle's solution to Zeno's arrow paradox, which remained essentially unchanged throughout the era of Galilean relativity and Newtonian mechanics, suffered a mortal blow three-quarters of a century ago. We now know that, ultimately, the ``smooth'' nature of space-time recognised by Galileo and Newton, and which underwent a relative benign ``warping'' in Einstein's classical General Relativity, must somehow be fundamentally composed of quantum mechanical ``gravitons''; unfortunately, knows exactly how. Zeno's very question, ``How does anything move at all?'', is again <#402#>the<#402#> unsolved problem of physics. But that is a story for another place. Let us therefore return to the task at hand, and utilise the method we have gleaned from seventeenth century Florence.
<#403#>GalAnti<#403#><#404#> ing<#404#> Consider the rasterised display methology reviewed in section~<#405#>CurrentRasters<#405#>. How does its design philosophy fit in with the above historical figures' views on motion? It is apparent that the ``slicing up'' in time of the images presented on the display device, considered simplistically, only fits in well with Zeno's ideas on motion. However, we have neglected the human side of the equation: clearly, if frames are presented at a rate exceeding the viewer's visual system's temporal resolution, then the effective integration performed by the viewer's brain combines with the ``time-sampled'' images to reproduce continuous motion---that is, at least for motion that is slow enough for us to follow visually.
Consider now the ``interpolation'' procedure used by the video processor
and frame buffer.
Is this an optimal way to proceed?
Aristotle would probably have said ``no''---the objects, in the
intervening time between updates, should seek their ``natural places''.
Galileo, on the other hand, would have quantified this criticism: the
objects depicted should move with either constant velocity if free,
or constant acceleration if they are falling; if subject to ``violent
motion'', this would also have to programmed.
Instead, the sample-and-hold philosophy of section~<#406#>CurrentRasters<#406#>
keeps each
object at one certain place on the display
for a given amount of time, and then makes
it <#407#>spontaneously jump<#407#> by a certain distance; and so on.
In a sense, the pixmap <#408#>has no inertial properties<#408#>.
As noted, this <#409#>is<#409#> the ideal behaviour for an object that is not moving
at all; its manifest incorrectness for a moving object is even more
simply revealed by simple mechanics:
Consider how an object travelling at constant apparent velocity
This spurious motion can also be viewed in another light. If one draws a <#419#>space-time diagram<#419#> of the trajectory of the object as depicted by the sample-and-hold video display, one obtains a staircase-shaped path. The <#420#>correct<#420#> path in space-time is, of course, a straight line. The saw-tooth error function derived above is the difference between these two trajectories; the ``jumping'' is the exact spatio-temporal analogue of <#421#>the jaggies<#421#>---the (spatial) ``staircase'' effect observable when straight lines are rendered in the simplest way on bitmapped (or rectangular-grid-sampled) displays. The mathematical description of this general problem with sampled signals is <#422#>aliasing<#422#>; in rough terms, high-frequency components of the original image ``masquerade as'', or <#423#>alias<#423#>, low-frequency components when ``sampled'' by the bitmapping procedure, rendering the displayed image a subtly distorted and misleading version of the original.
As is well-known, however, aliasing <#424#>can<#424#> be avoided in a sampled signal, by effectively filtering out the high-frequency components of the original signal before they get aliased by the sampling procedure. This technique, applied to any general sampled signal, is termed <#425#>ing<#425#>; in the field of computer graphics, reference is often made to <#426#>spatial ing<#426#> techniques used to remove ``the jaggies'' from scan-converted images. (This is often shortened, in that field, to the unqualified term ``ing''; we shall reject this trend and nstate the adjective ``spatial''.) For the same reasons, the ``jerky motion'' of sample-and-hold video controllers is thus most accurately referred to as <#427#>spatio-temporal aliasing<#427#>; any method seeking to remove or reduce it is <#428#>spatio-temporal ing<#428#>.
One form of spatio-temporal antialising is performed every time we view standard television images. Generally, television cameras have an appreciable <#429#>shutter time<#429#>: any motion of an object in view during the time the (electronic) shutter is ``open'' results in <#430#>motion blur<#430#>. That such blur is in fact a <#431#>good<#431#> thing---and not a shortcoming---may be surprising to those unfamiliar with sampling theory. However, the fact that the human eye easily detects the weird effects of spatio-temporal aliasing if motion blur is <#432#>not<#432#> present, even at the relatively high field rate of 50~Hz (or 60~Hz in the US), can be appreciated by viewing any footage from a modern sporting event, such as the Barcelona Olympics. To improve the quality of the now-ubiquitous slow-motion replay (for which motion blur is stretched to an unnatural-looking extent), such events are usually shot with cameras equipped with <#433#>high-speed<#433#> electronic shutters, electronic shutters that are only ``open'' for a small fraction of the time between frames. The resulting images, played at their natural rate of 50 fields per second, have a surreal, ``jerky'' look (often called the ``fast-forward effect'' because the fast picture-search methods of conventional video recorders lead to the same unnatural repression of motion blur). This effect is, of course, simply spatio-temporal aliasing; that it is noticeable to the human eye at 50 fields per second (albeit only 25 <#434#>frames<#434#> per second) illustrates our visual sensitivity. (For computer-generated displays, for which simulating motion blur may be relatively computationally expensive, increasing the refresh and update rates to above 100 Hz and relying on integration by the CRT phosphor or LCD pixel, and our visual system, may be the simplest solution.)
This <#435#>frame<#435#>-rate spatio-temporal aliasing, which is relatively easy to deal with, is not usually a severe problem. Our immediate concern, on the other hand, is a much more pronounced phenomenon: the <#436#>update<#436#>-rate spatio-temporal aliasing produced by the sample-and-hold nature of conventional video controllers (the spurious motion described by <#437#>SpuriousConstV<#437#>). Correcting the video controller's procedures to remove this spurious motion is thus our major task. Again, we recall Zeno's question: how does the arrow know where to move, if it only knows where it is, not where it's going? The answer, supplied first by Galileo (albeit in a somewhat long-winded form, in pre-calculus days), is that we need to know the <#438#>instantaneous time derivative of the position<#438#> ( instantaneous velocity) of the object at that particular time, in addition to its position. We shall refer to the use of such information (or, in general, any arbitrary number of temporal derivatives of an object's motion) to perform update-rate spatio-temporal antialiasing as <#439#> ing<#439#>. Suggested methods for carrying out this procedure with existing technology are described in the remainder of this .
To carry out this task, we need to xamine the video controller philosophy described in section~<#440#>CurrentRasters<#440#>. The most obvious observation that strikes one is that, using that design methodology, <#441#>velocity information is not provided to the video controller at all!<#441#> The reason for this omission is easily understood in historical perspective. <#442#>Television<#442#> applications for CRTs preceded computer graphics applications by decades. At least initially, all television images were generated by simply transmitting the signal from a video camera, or one of a number of available cameras. However, normal video cameras have no facilities for determining the <#443#>velocity<#443#> of the objects they view. (Although this is not, in principle, impossible, it would be technically challenging, and quite possibly of no practical use.) Rather, the high frame and field rate of a television picture alone, together with suitable motion blur, were sufficient to convince the viewer of the television image that they were seeing continuous, smooth motion.
When CRTs were first used for computer applications, in vector displays, the voltages applied to the deflection magnets were directly controlled by the video harware; such displays' only relation to television displays was that they both used CRT technology. However, when simple <#444#>rasterised<#444#> computer displays became feasible in the early 1970s, it was only natural that their development was built on the vast experience gathered from television technology---which, as noted, has no notion of storing velocity information. In fact, it is only in very recent years that memory technology has been sufficiently advanced that the <#445#>physics of the display devices<#445#>---rather than the amount of amount of video memory feasible---is now the limiting factor in developing ever more sophisticated displays at a reasonable price. To even contemplate storing the velocity information of a frame---even if it <#446#>were<#446#> possible to determine such information---is something that would have been unthinkable ten years ago. It is, of course, no cidence that the field of has also just recently become cost-effective: the immature state of processor and memory technology was the critical factor that limited Sutherland's pioneering efforts twenty-five years ago. It is thus no surprise that the fledgling commercial field of requires new approaches to traditional problems.
Of course, the very nature of , while putting us in the position of requiring rapid updates to the entire display, conversely provides us with the <#447#>very<#447#> information about displayed objects we need: namely, velocities, accelerations, and so on, rather than just the simple <#448#>positional<#448#> information that a television camera provides. Now, it is of course a trivial observation that all virtual-world engines already ``know'' about the laws of Galilean mechanics, or Einsteinian mechanics, or nuclear physics---or indeed any system of mechanics that we wish to program into them, either based on the real universe, or of a completely fictional nature. In that context, our rehash of the notions behind Galilean mechanics may seem trivial and unworthy of the effort spent. What existing virtual-world engines do <#449#>not<#449#> do, however, is <#450#>share some of this information with the video controller<#450#>. On this front, apparent triviality is magnified to enormous importance; our neglect of these same physical laws is, in fact, creating an artifical and unnecessary degradation of performance in many existing hardware methodologies.
There is no reason for this omission to continue; the physics has been around for over three hundred and fifty years; and, fortunately, the technology is now ripe. The following sections will provide, it is hoped, at least a very crude and simplistic outline of the paths that must be travelled to produce a fully-functional system employing ing.
<#451#>Galpixels<#451#><#1664#>Galpixels and
As memory---and the processor power necessary to use it---became even more plentiful, rasterised display options ``fanned out'' in a number of ways. At one extreme, the additional memory could be used to simply improve the spatial resolution of the display, while maintaining its bitmapped nature. At the other extreme, the additional memory could be used exclusively to generate a multi-level response for each pixel position---for grey-scale, say, or a choice of colours---without increasing the resolution of the display at all; the resulting memory map, now no longer accurately described as a ``bit'' map, is preferentially referred to as a <#454#>pixmap<#454#>. In between these two extremes are a range of flexible alternatives; to this day, hardware devices often still provide a number of different ``video modes'' in which they can run.
Increasing the memory availability yet further led, in the 1980s, to the widespread use of <#455#>z-buffers<#455#>, both in software and, increasingly, hardware implementations (whereby the ``depth'' of each object displayed on the display is stored along with its intensity or colour). We can see here already an extension to the concept of a pixel: not only do we store on--off information (as in bitmaps), nor simply intensity or colour shading information (as in early pixmaps), we also include additional, <#456#>non-displayed<#456#> information that assists in the image generation process. (Current display architectures also routinely store several more bits of <#457#>control<#457#> information for each pixel.)
We now extend this concept of a ``generalised pixel'' still further, with our goal of ing firmly in our sights. As well as storing the pixel shading, z-buffer and control information, we shall also store the <#458#>apparent velocity<#458#> of each pixel in the pixmap. We use the term <#459#>apparent motion<#459#> to describe the motion of objects in terms of display Cartesian s: x horizontal, increasing to the right; y vertical, increasing as we move upwards; and z normal to the display, increasing as we move out from the display towards our face. This motion will typically be related to the <#460#>physical motion<#460#> of the object ( its motion through the 3-space that the system is simulating) by perspective and rotational transformations; however, in section~<#461#>Enhancements<#461#>, more sophisticated transformations are suggested between the apparent and physical spaces.
Thus, for z-buffered displays (assumed true for the remainder of
this ), three apparent velocity components must be stored for each
pixel---one component for
each of the x, y and z directions.
The motional information stored with a pixel, however,
need not be limited to simply its apparent velocity.
In general, we are free to store as many instantaneous
derivatives of the object's motion as we desire.
The rate of change of velocity, the <#462#>acceleration<#462#> vector
We shall defer to the next section the process of deciding just how many such motional derivatives we should store with each pixel. For the moment, we shall simply refer to any pixmap containing motional information about its individual pixels as a <#465#> pixmap<#465#>, or <#466#>galpixmap<#466#>. The individual pixels within a galpixmap will be referred to as <#467#> pixels<#467#>, or <#468#>galpixels<#468#>. Of course, in situations where distinctions need <#469#>not<#469#> be made between these objects and their traditional counterparts, the additional prefix <#470#>gal-<#470#> may simply be omitted.
Finally, it will be useful to have some shorthand way of denoting
the highest order derivative of (apparent) motion that is stored within
a particular galpixmap.
To this end, we (tentatively) use the notation <#1665#>
As will be seen shortly, additional pieces of information, over and above mere motional derivatives, will also be required in order to effectively carry out ing in practical situations. Although the amount of information thus encoded may vary from implementation to implementation, we shall not at this stage propose any notation to describe it; if indeed necessary, such notation will evolve naturally in the most appropriate way.
<#477#>GalpixmapStructure<#477#><#478#>Selecting a Suitable Galpixmap Structure<#478#> We now turn to the question of determining <#479#>how much<#479#> additional information should be stored with a galpixmap, in order to maximise the overall improvement in visual capabilities of the system that are perceived by the viewer. Such questions are only satisfactorily answered by considering <#480#>psychological<#480#> and <#481#>technological<#481#> factors in equal proportions. That a purely technological approach fails dismally is simply shown: consider the sample-and-hold video controller philosophy described in section~<#482#>CurrentRasters<#482#>, as (successfully) applied to static objects on the display. We noted there that the video controller effectively boosted the perceived information rate of the display from the <#483#>update<#483#> rate up to the <#484#>refresh<#484#> rate, simply by repeatedly showing the same image. Shannon's information theory, however, tells us that this procedure <#485#>does not<#485#>, in fact, increase the information rate one bit: the repeated frames contain no new information---as, indeed, can be recognised by noting that the viewer could, if she wanted to, reconstruct these replicated frames ``by hand'' even if they were not shown. Thus, even though we <#486#>know<#486#> that frame-buffered rasterised displays ``look better'' than display systems without such buffers ( vector displays), information theory tells us that, in a raw mathematical sense, the frame buffer itself doesn't do anything at all---a fact that must be somewhat ironically amusing to at least one of Shannon's former PhD students.
Raw mathematics, therefore, does not seem to be answering the questions we are asking. A better way to view this <#487#>apparent<#487#> increase in information rate is to examine the viewer's subconscious prejudices about what her eyes see. She may not, in fact, even realise that the display processor <#488#>is<#488#> only generating one update every so often: to her, each frame looks just as fair dinkum as any other. All of this visual information---a static image---is simply preprocessed by her visual system, and compared against both ``hard-wired'' and ``learnt'' consistency checks. Is a static image a reasonable thing to see? Did I really see that? Was I perhaps blinking at the time? Am I moving or am I stationary? What do I <#489#>expect<#489#> to see? It is the lightning-fast evaluation of these types of question that ultimately determines the ``information'' that is abstracted from the scene and passed along for further cogitation. In the case described, assuming (say) a stationary viewer sitting in front of a fixed monitor, all of the consistency checks balance: there appears to be a fair-dinkum object sitting in front of her. In other words, the display is providing sufficient information for her brain to conclude that the images seen are consistent with what would be seen if a real object were sitting there and reflecting photons through a transparent medium in the normal way; that is all that ultimately registers.
We now turn again to our litmus test: an object
with a <#490#>uniform apparent
velocity<#490#> being depicted on the display.
Using a
Let us now assume that the display system is not a
Ignore, for the moment, that the procedure described seems to double the amount of time for the video controller to do its work. (A moment's reflection reveals that, in any case, there is no fundamental reason why the new-frame-drawing procedure cannot occur at the same time that the <#503#>previous<#503#> frame is being scanned to the display device.) What will the viewer think that she is seeing? Well, the object will clearly jump a small distance each frame---with each jump exactly the same size as the last (at least, to the nearest pixel), until the new update is available. If the object really <#504#>is<#504#> travelling with constant apparent velocity (our assumption so far), then upon receipt of the new image update, the object will jump <#505#>the same<#505#> small distance from the last (video-controller-generated) frame as it has been jumping in the mean time (assuming focus-switching is appropriately synchronised, of course). Now, the <#506#>refresh<#506#> rate of the system is assumed to be significantly faster than the visual system's temporal resolution; therefore, the motion will look like convincingly like uniform motion. Uniform motion has been Galilean antialiased!
Let us examine, now, what ``residual'' motion we are left with when this uniform motion is ``subtracted off'', via a transformation, from the perceived motion. We now---thankfully---do not end up with the horrific expression <#507#>SpuriousConstV<#507#>, but rather with an expression that is <#508#>almost<#508#> zero. In the setup described the error is not <#509#>precisely<#509#> zero---if the apparent velocity of the object does not happen to be some integral number of pixels per frame, then the best we can do is move the pixel to the ``closest'' computed position---leading to a small pseudo-random saw-tooth-like error function in space-time, we are hitting the fundamental physical limits of our display system. However, the fact that the <#510#>amplitude<#510#> of the error is at most one pixel in the spatial direction, and one frame in the temporal direction, means that it is a vastly less obtrusive form of antialiasing than the gross behaviour described by <#511#>SpuriousConstV<#511#>. (If so desired, however, even this small amount of spatio-temporal aliasing can be removed with suitable trickery in the video controller; but we shall not worry about such enhancements in this .)
Having successfully convinced the viewer of near-perfect constant motion, let us now worry about what happens if the object in question is, in fact, being <#512#>accelerated<#512#> (in terms of display s), rather than moving with constant velocity. For simplicity, let us assume that the object is undergoing <#513#>uniform acceleration<#513#>. Fortunately, such a situation is familiar to us all: excluding air resistance, all objects near the surface of the earth ``fall'' by accelerating (``by the force of gravity'', in 19th century terminology) at the same constant rate. How do our various display systems cope with this situation?
Let us assume that the object in question is initially stationary,
positioned near
the ``top'' of the display.
Let us further assume that the acceleration has the value 2~pixels per
frame per frame.
Firstly, let us consider the optimal situation: the display
processor is sufficiently fast to update the object each frame.
Clearly, if we shift our axes in such a way
that y = 0 corresponds to the initial
position of the object, its vertical position in successive frames
will be given by
<#514#>AccelBest<#514#>
-y=0,1,4,9,16,25,36,49,64,81,...,
as is verified from the formula
Let us now examine how this object is depicted on a
Let us now return to the case of the falling object, and determine how
its motion will be depicted on the
It is straightforward to compute how the
Let us now consider making these last two
examples just a tad more realistic.
Keeping the other parameters constant, let us assume that the display
processor is now only fast enough to generate one update every <#544#>three<#544#>
frames.
Our dust-gathering
Clearly, this would not be in public circulation were this
problem a real one, rather than a case of mathematics-gone-wild.
Already, we have seen that using a
We first return to the above thought experiment of
using a
Let us now entertain the notion that we could, at this stage, be
forever more happy with our
The main problem with our simplistic examples above, which will show
why they are not completely representative of the real world,
is that they are all <#598#>one-dimensional<#598#>.
Of course, we can tranform the ``falling'' example to a ``projectile''
situation by simply superimposing an initial horizontal velocity on
the vertical motion; this would simply require applying the uniform motion
and uniform acceleration cases cojointly.
However, we have also simplified the world by only talking about some
unspecified, featureless, Newtonian-billiard-ball-like ``object'',
without worrying about the extended
three-dimensional structure of the object.
A 2-D world, of course, is infinitely more interesting than
a 1-D world, not least because it becomes possible to
step <#599#>around<#599#> other people, instead of simply bouncing into them.
In a ``2
In fact, a fairly simple example will suffice to
illustrate the problems that are encountered when we move up from
1-D to 2-D or 3-D motion.
Consider a two-dimensional square, in the plane of the display,
which is also <#604#>uniformly rotating<#604#> in that same plane about the centre
of the square.
Choose any point on the boundary of this square.
The path traced out by this point in time will be a <#605#>circle<#605#> (as is
simply verified by considering that point alone, without the complication
of the rest of the square).
Now, a
Now consider how a
It might be argued that this example merely shows that you cannot
stretch the ing procedure indefinitely; ultimately, you
must update the display at a <#612#>reasonable<#612#> rate, even if that is
considerably lower than the refresh rate.
To a certain extent, this is indeed true.
However, rejecting
This purely mathematical line of reasoning, while most helpful in our deliberations, nevertheless again overlooks the fact that, ultimately, all that matters is what the viewer <#618#>thinks<#618#> she is seeing on the display, not how fancy we are with our mathematical prowess. To investigate this question more fully, it is necessary to perform some investigations of a psychological nature that are not completely quantitive. These deliberations, however, shall require an additional piece of equipment, readily available to the author, but (unfortunately) not to all workers: a <#619#>Melbourne tram<#619#>. (Visitors to the mecca of Seattle can, however, make use of the authentic Melbourne trams that the city of Seattle bought from the Victorian Government ten years ago, which now trundle happily along the waterfront with a view of Puget Sound rather than Port Phillip Bay.) Melbourne trams (the 1920s version, not the new ``torpedo'' variety) have the unique property that, no matter how slowly and carefully they are moving, they always seem to be able to hurl standing passengers spontaneously into the lap of the nearest seated passenger (which may or may not be an enjoyable experience, depending on the population of the tram). This intriguing (if slightly frivolous) property of such vehicles can actually be used as a reasonably quantitative experimental investigation of the kinematical capabilities of humans.
Consider a Melbourne tram located in the Bourke Street Mall, sitting at the traffic lights controlling its intersection with the new Swanston Street Mall. A passenger standing inside the tram looks out the window. Apart from wondering why on earth Melbourne needs so many malls---or indeed why one needs traffic lights at all at the intersection of two malls---the passenger is unperturbed; she can take a good look at the surrounding area. She drops a cassette from her Walkman into her handbag: it falls straight in.
Now consider the same tram thirty seconds later, as it is moving at a constant velocity along Bourke Street. The standing passenger is again simply standing around; cassettes drop fall straight down; if it were not for the passing scenery, she wouldn't even know she was moving. And, of course, physics assures us that this is always the case: inertial motion cannot be distinguished from ``no'' motion, except with reference to another moving object. The laws of physics are the same.
We now take a further look at our experimental tram: it is now accelerating across the intersection at Russell Street. Old Melbourne trams, it turns out, have quite a constant rate of acceleration when their speed controls are left on the same ``notch'', at least over reasonable time periods. Let us assume that this acceleration is indeed constant. With our knowledge of physics, we might predict that our standing passenger might have to take some action to avoid falling over: the laws of Newtonian physics <#620#>change<#620#> when we move to an accelerated frame. Inertial objects do not move in straight lines. Cassettes do not fall straight down into handbags. Surely this is a difficult environment in which to be merely standing around?
Somewhat flabbergasted, we find our passenger standing in the accelerated tram, not holding onto anything, unperturbedly reading a novel. How is this possible? Upon closer examination, we notice that our passenger is <#621#>not<#621#> standing exactly as she was before: <#622#>she is now leaning forward<#622#>. How does this help? Well, to remain in the same position in the tram, our passenger must be accelerated at the same rate as the tram. To provide this acceleration, she merely leans forward a little. The force on her body due to gravity would, in a stationary tram, provide a torque that would topple her forwards. Why does this not also happen in the accelerating-tram case then? The answer is that the additional forward frictional force of the tram's flooring on her <#623#>shoes<#623#> both provides a counter-torque to avoid her toppling, as well as the forward force necessary to accelerate her at the same rate as the tram. Looked at another way, were she to <#624#>not<#624#> lean forward, the frictional force forward of the accelerating tram would produce an unbalanced torque on her that would topple her <#625#>backwards<#625#>.
Leaning forward at the appropriate angle is, indeed, a fine trick on the part of our passenger. Upon questioning, however, we find to our dismay that she knows nothing about Newtonian mechanics at all. We must therefore conclude that ``learning'' this trick must be something that humans do spontaneously---everyone seems to get the hang of it pretty quickly.
It should be noted that <#626#>this trick would not work in space<#626#>. Without borrowing the gravitational force, there is no way to produce a counter-torque to that provided by the friction on one's shoes. Of course, this friction <#627#>itself<#627#> should not be relied on too much: without any gravitational force pushing one's feet firmly into the floor, there may not be any friction at all! This means that there will, in general, be no unbalanced torque to topple one backwards anyway: the passengers in an accelerating space vehicle, (literally) hanging around in mid-air, will simply continue to move at constant velocity. But the accelerating vehicle will then catch up to them: they will slam against the <#628#>back<#628#> wall of the craft! Of course, this is precisely Einstein's argument for the Equivalence Principle between gravity and acceleration---if you are a standing passenger, you had better find what will become the ``floor'' in your accelerated spacecraft quick smart---but it shows that life on earth has prepared us with different in-built navigation systems than what our descendents might require.
What lessons do we learn from this? Firstly, it is unwise to put a Melbourne tram into earth orbit. More importantly, it shows that humans are quite adept at both extrapolating the effects of constant acceleration (as shown by our ability to catch projectiles), as well as being able to function quite easily in an accelerated environment (as shown by our tram passenger).
Let us now examine the tram more closely <#629#>before<#629#> it accelerates away. It is now sitting at the traffic lights at Exhibition Street. The lights turn green. The driver releases the tram's air-brakes with a loud hiss. Our passenger spontaneously takes hold of one of the overhanging stirrups! More amazingly, another passenger spontaneously starts to fall forwards! The tram jerks, and accelerates away across Exhibition Street. Our passenger, grasping the stirrup, absorbs the initial jerk and, as the tram continues on with a constant acceleration, lets go in order to turn the page of her novel. The second passenger, the spontaneously-falling character, magically did not fall down at all: the tram accelerated at just the right moment to hold him up---and there he is, still leaning forward like our first passeneger! However, all is <#630#>not<#630#> peaceful: a Japanese tourist, who boarded the tram at Exhibition Street, has tumbled into the lap of a (now frowning) matronly figure, and is struggling to regain his tram-legs.
What do we learn, then, from this experience? Clearly, the Japanese tourist represents a ``normal'' person. Accustomed to acceleration, but <#631#>not<#631#> to the discontinuous way that it is applied in Melbourne trams, he became yet another veteran of lap-flopping. Our first passenger, on the other hand, who grabbed the stirrup upon hearing the release of the air-brakes, had clearly suffered the same fate in the distant past, and had learnt to recognise the audible clue that the world was about to shake: such is the Darwinian evolution of a Melburnite. The second passenger, who spontaneously fell forward, appears to be an even more experienced tram-dweller: by simply falling forward he had no need to grasp for a stirrup or the nearest solid structure. This automatic response, which relies for its utility on the fact that Melbourne tram-driving follows a fairly standard set of procedures, is perhaps of interest to behavioural scientists, but does <#632#>not<#632#> indicate that the passenger had any infallible ``trick'' for avoiding the effects of jerks (to employ Feynman's term)---the ``falling'' method does not work at all if the jerk is significantly delayed for some unknown reason. (A somewhat mischievous tram driver once confessed that his favourite pastime was releasing the air-brake and then not doing anything---and then watching all of the passengers fall over.) Of course, the Melbourne trams on the Seattle waterfront have an extra reason for unexpected deceleration: in a city covered by decrepid, unused <#633#>train<#633#> tracks, motorists turning across the similar-looking <#634#>tram<#634#> tracks get the fright of their lives when they find a green, five-eyed monster bearing down on them!
Returning, now, to the task at hand, our admittedly simplified examples above show that people are, in general, relatively adept at handling <#635#>acceleration<#635#> (not surprising, considering our need to deal with gravity), but not too good when it comes to <#636#>rate of change<#636#> of acceleration, or <#637#>jerk<#637#>. Numerous other examples of this general phenomenon can be constructed: throw a ball and a person can usually catch it; but half-fill it with a liquid, so that it ``swooshes around'' in the air, and it can be very difficult to grab hold of. Sitting in a car while it is accelerating at a high rate ``feels'' relatively smooth; but if the driver suddenly lets off the accelerator just a little, your head and shoulders go flying forward---despite the fact that you are still being accelerated in the <#638#>forward<#638#> direction! In each of these examples, it is the (thus appropriately named) <#639#>jerk<#639#> that throws our inbuilt kinematical systems out-of-kilter; not too surprisingly, it is difficult to formulate uncontrived examples in the <#640#>natural<#640#> world in which jerks are prevalent (apart from falling out of a tree, of course---but repeatedly hitting the ground is not a technique well suited to evolutionary survival).
We now have two pieces of information on which to base a decision about
what n should be in a practical
Clearly, positional--rotational data is the common
denominator among existing transducer technology: find some
physical effect that lets you determine how far away the participant
is from a number of fixed sensors, as well as her orientation with
respect to these sensors, and you ``know where she is''.
Transducers for <#644#>velocity<#644#> information---which tell you
``where she's going'' (but not ``where she is'')---are also
commonplace in
commerical industry, albeit less common in .
However, even if such transducers are <#645#>not<#645#> used, quite a reasonable
estimate of the true velocity of an object may be obtained by taking
differences in positional data (as was used to calculate
the results in <#646#>ComputeVel<#646#>).
On the other hand, this ``numerical differentiation'' carries two
inherent dangers: firstly,
computing <#647#>any<#647#> differentiation on physical data
enhances any high-frequency noise present; and, secondly, performing
a <#648#>discrete-time<#648#> numerical differentiation introduces lags into the
data (, in rough terms, you need to wait until t = 5 to compute
In a similar way, acceleration can either be measured directly, or computed from the velocity data by discrete numerical differentiation. In many respects, the laws of Nature make accelerometers <#650#>easier<#650#> to make than speedometers. This is, of course, due to the fact that <#651#>uniform motion is indistinguishable from no motion at all<#651#>, as far as Newtonian mechanics is concerned: it is vitally necessary to measure velocity ``with respect to something'' (such as the road, for an automobile; or the surrounding air, for an aeroplane---``ground speed'' being much more difficult to measure because there is [hopefully] no actual contact with the ground!). On the other hand, <#652#>accelerations<#652#> cause the laws of physics to change in the accelerated frame, and can be measured without needing to ``refer'' to any outside object. (Of course, this is not completely true: if (classical) laws of physics are written in a <#653#>generally relativistic<#653#> way, they will also hold true in accelerated frames; but that is only of academic interest here.) Nevertheless, even though these properties make acceleration inherently easier to measure than velocity, the designer must ultimately worry about both minimising the cost of the system, and minimising the number of gadgets physically attached to the participant---and it is unlikely that <#654#>both<#654#> velocity and acceleration transducers would be deemed necessary; one or the other (or, indeed, both) would be omitted. Of course, acceleration may be deduced from velocity data numerically, and carries the same dangers as velocity data obtained numerically from positional data. Most dangerous of all is if acceleration data must be numerically obtained from velocity data that was <#655#>itself<#655#> obtained numerically from positional data; the errors compound.
Of course, it is also possible actually <#656#>omit<#656#> measuring positional information altogether, and instead obtain it by integrating measured velocities. This integration actually <#657#>reduces<#657#> high frequency noise---but what one gains on the swings one loses on the roundabouts: the <#658#>low<#658#> frequency noise is boosted---manifested, of course, in ``drift'' in the measured origin of the system, which must be regularly calibrated by some other means. Alternatively, the relative simplicity of the physics may lead a designer to simply use <#659#>accelerometers<#659#> as transducers, integrating this information once to obtain velocity data, and a second time to obtain positional data. Of course, this double-integration suppresses high-frequency noise even further, but requires regular calibration of not only the origin of the <#660#>positional<#660#> system, but also of the origin of the <#661#>velocity<#661#> information ( knowing when the transducer is ``stationary'' with respect to the laboratory)---which is again a manifestation of the general Wallpaper Bubble Conservation Law ( whenever you get rid of one problem it'll usually pop up somewhere else).
Keeping in mind the above technical and design concerns involved in
determining even positional, velocity and acceleration information
from physical transducers, what chance is there for us to extend this
methodology to measuring <#662#>jerk<#662#> data?
On the physics side, there are few (if any) genuinely simple physical
effects that could inspire the design of a <#663#>jerkometer<#663#> (to coin a
somewhat obscene-sounding term).
It is not even clear that a jerkometer would be all that reliable
an instrument anyway: since (by Newton's Second Law,
On the numerical-differentiation side, on the other hand, whether it would be wise or not to perform an extra differentiation of the acceleration data to obtain the jerk data depends largely on where the acceleration information originates. If it is obtained from an actual physical accelerometer, such a numerical differentiation would probably be reasonable. However, if the physical transducer is in fact a velocity- or position-measuring device, then one would not wish to place too much trust on a second- or third-order numerical derivative for a quantity that is already subject to concerns in terms of basic physics: most likely, all one would get would be a swath of potentially damaging instabilities in the closed-loop system. Thus, a numerical approach depends intimately on what order of positional information is actually yielded by the physical transducers.
We now make the following suggestion: The visual display architecture of technology should make only <#675#>minimal<#675#> assumptions about the nature of the physical tranducers used elsewhere in the system. This suggestion is based, of course, on the concept of <#676#>modularity<#676#>: if groups of functional components in any system cohere, by their very intrinsic nature, into readily identifiable ``modules'', then any one ``module'' should not be made unnecessarily and arbitrarily dependent on the internal nature of another ``module'' <#677#>unless<#677#> the benefits gained from whole outweigh the loss of encapsularity of the one. If this suggestion is accepted (which, in some proprietary situations, may require consideration of the future health of the industry rather than short-term commerical leverage), then it is clear that it would be inappropriate for a display system to assume that a given environment obtains anything more than raw positional--rotational information from physical transducers. With such a minimalist assumption, our above considerations show that it would <#678#>not<#678#> be wise, in general, to insist that jerk information about the physical motion of the participant be provided to the display system. Of course, this does not prevent jerk information being obtained about the other <#679#>computer-generated<#679#> objects in the virtual world---their trajectories in space are (in principle) knowable to arbitrary accuracy; arbitrary orders of temporal derivative may be computed with relative confidence. However, if a display system <#680#>did<#680#> use jerk information for virtual objects, but not for the participant herself, it would all be for naught anyway. To appreciate this fact, it is only necessary to note that <#681#>all<#681#> of the visual information generated in a virtual world scenario <#682#>depends solely on the relative positions of the observer and the observed<#682#>. Differentiating the previous sentence an arbitrary number of times, it is clear that the <#683#>only<#683#> relevant velocities, accelerations, jerks, , in a Galilean antialiased display environment are the <#684#>relative<#684#> velocities, accelerations, jerks, , of the observer and the observed. Of course, this property of <#685#>relativity<#685#> is a fundamentally deep and general principle of physics, whether it be Relativity, or Einstein's Special Relativity or General Relativity; it will probably not, however, be an in-built and intuitively obvious part of humanity's subconscious until we more regularly part company with <#686#>terra firma<#686#>, and travel around more representative areas of our universe. (Ever played <#687#>Wing Commander<#687#>? Each Terran spacecraft has a maximum speed. But <#688#>with respect to what?!<#688#> Galileo would turn in his grave....) Thus, using jerk information for one half of the system (the virtual objects) but not the other half (the participant) brings us no benefits at all---and, indeed, the inconsistencies in the virtual world that would result may well be a significant degradation.
Returning, again, to the task at hand, the above deliberations
indicate that, all in all, the most appropriate order of Galilean
antialiasing that should be used,
at least for applications, is probably
<#691#>MinimalImplementation<#691#><#692#>A Minimal Implementation<#692#> The previous sections of this have been concerned with the development of the underlying philosophy of, and abstract planning for, ing in general. In this section, we turn directly to the practical question of how one might retrofit these methods to existing technology. Section~<#693#>MinHardwareMods<#693#> outlines the minimal modifications to existing display control hardware that must be implemented; section~<#694#>MinSoftwareMods<#694#> describes, in general terms, the corresponding software enhancements necessary to drive the system. More advanced enhancements to the general visual-feedback methodology of ---which would, by their nature, be more amenable to implementation on new, ground-up developments---are deferred to section~<#695#>Enhancements<#695#>.
<#696#>MinHardwareMods<#696#><#697#>Hardware Modifications and Additions<#697#>
Our first task in modifying an exisiting system
is to determine precisely what changes must be made
to its hardware:
if such changes are technically, financially or politically
unattainable,
then a retrofit will not be possible at all, and further speculation
would be pointless.
Clearly, the area of existing
As noted in section~<#707#>Galpixels<#707#>, the galpixmaps that will be stored in the (now necessarily multiple) frame buffers extend significantly on the simple intensity or colour information stored in a regular pixmap. However, there is clearly no need for a galpixmap to be <#708#>physically<#708#> configured as a rectangular array of galpixel structures (in terms of physical memory); rather, a much more sensible configuration---especially in a retrofit situation---is to maintain the existing hardware for the frame buffer pixmap (and duplicate it, where necessary), and construct new memory device structures for storing the additional information, such as velocity and acceleration, that the galpixmap requires. The advantage of this approach is that, properly implemented, the detailed circuitry responsible for actually scanning out the frame buffer to the physical display device may be able to be left unchanged (apart, perhaps, for including a frame-buffer multiplexer if hardware double-buffering is not already employed by the system). This is a particularly important simplification for retrofit situations, since, in general, the particular methodology employed in the scan-out circuitry depends largely on the precise nature of the display technology used.
We now turn to the question of what information <#709#>does<#709#> need to be
stored in the extended memory structures that we are adding to each
frame buffer.
Clearly, the (``display'', or ``apparent'')
<#710#>position vector<#710#> of the i-th galpixel,
The <#714#>velocity vector<#714#> of the i-th galpixel,
The next questions that must be considered are:
What numerical format should we store the velocity and acceleration
information in?
How many bits will be needed?
These questions are of extreme importance for the
implementation of any ed technology.
That this is so can be recognised by calculating just how many such
quantities need to be stored in the video hardware subsystem:
We need both a velocity and an acceleration value for every galpixel in a
frame buffer.
Velocity and acceleration each have three components.
We need at least three such frame buffers for each display device
(two for the video controller to propagate between, and one for the
display processor to play with at the same time);
and, for stereoscopic displays, will need two (logical) display devices.
Even for a bare-minimum display resolution of (say)
Let us, therefore, make some crude estimates as to how we would
like our physical system to perform.
Assume, for argument's sake, that we are implementing a 50~Hz refresh
rate display system;
each frame period is then 20 milliseconds.
Propagating quantities forward with a finite numerical accuracy
leads to accumulated errors that increase with time.
In particular, the position of an object will be ``extrapolated'' poorly
if we retain too few significant figures in the velocity and
acceleration---even ignoring the fact that the acceleration of the
object may have, in fact, changed in the mean time.
How poor a positional error, arising from numerical accuracy alone,
can we tolerate?
Let us say that this error should be no worse than a single pixel or so.
But the error in position will, in a worst case scenario, increase
linearly with time, number of extrapolated frames.
How many frames will we want to extrapolate forwards while still
maintaining one-pixel accuracy?
Well, since we will be using binary arithmetic eventually, let's choose
a power of 2---say, 16 frames.
This corresponds to an inter-update
time
( the time for which the video controller itself happily propagates
the motion of the galpixels between display processor updates)
of 320~milliseconds---which should
be <#722#>more<#722#> than enough, considering that the participant's acceleration
(not to mention that of the objects in the virtual world)
will have no doubt
changed by a reasonable amount by then---and the view, if it has
not been updated by the display processor, will thus be reasonably
inaccurate anyway.
Of course, the whole display system won't suddenly fall over if, in
some situation, we don't actually get a display processor
update for more than 16 frames---it is just that the inherent
numerical inaccuracy
of the propagation equations will simply grow larger than 1~pixel.
We shall say that the display system is <#1666#>rated for a
OK, then, how do we use our design choice of
Now, using <#741#>PropPos<#741#> and <#742#>PropVel<#742#>, how many fractional bits do
we need to store for
Recovering our composure, we ask:
What happens if we <#747#>are<#747#> restricted to having integral
Regaining our composure again, let us reconsider the reason why the
proverbial hit the fan in the first place.
Our basic problem is that a pixmap matrix has no such thing as a
``fractional row'' or ``fractional column''.
Maybe we could increase the resolution of our display...and call
the extra pixels ``fractions''?
Hardly a viable proposition in the real world---and
in any case we'd be simply palming off the
problem to the <#750#>new<#750#> pixels.
Maybe we could leave the display at the same resolution, but replace
each entry in the galpixmap with a little galpixmap matrix of its own?
Then we could have ``fractional rows and columns'' no problems!
Well, how big would the little matrix need to be?
Seeing as a velocity of vxi = 1 pixel per frame moves us by
16 pixels in 16 frames---and we only want to move one pixel,
max., in this time period---we should therefore reduce the minimum
computable velocity to 1/16 of a pixel per second.
This, then, requires a
Regaining our composure for the third (and final) time, let us consider this last proposal a little more rationally. We uncontrollably assumed that what was needed at each pixel location was a little galpixmap, with all the memory that that requires. Do we really need all this information? What would it mean, for example, to have a whole lot of little ``baby galpixels'' moving around on this hugely-expanded grid? What if the babies of two different (original-sized) galpixels end up on the same (original-sized) galpixel submatrix---does such a congregation of babies make any conceptual sense? Well, our display device only has <#752#>one<#752#> pixel per submatrix: so who gets it? Do we add, or average, the colour or intensity values for each of the baby galpixels? No---the object <#753#>closer<#753#> to the viewer should obscure the other. Should it be ``most babies wins''? No, for the same reason. Then is there any reason for having baby galpixels at all? It seems not.
Let us, therefore, look a little more closely at these last considerations. Since the display device only has one physical pixel per stored galpixel (of the original size, that is---the baby galpixels having now been adopted out), then, obviously, a galpixel can only move one pixel at a time anyway. But we only want to move the galpixel by one pixel every 16 frames---or every 3 or 7 or 13 frames, or whatever time period will, on the average, give us the right average apparent velocity of the galpixel in question. So how does one specify that the video controller is to ``sit around'' for some number of frames before moving the galpixel? Simple---put in a little counter, and tell it how many frames to wait. Of course, we need a little counter for each galpixel, but it need only count up to 16, so it only needs 4 bits anyway (in each of the x and y directions)---not a large price. Thus, we <#754#>can<#754#> get sub-pixel-per-frame velocities, with only a handful extra bits per galpixel!
Let us look at this ``counter'' idea from a slightly different direction.
Just say that we have told the video controller to count up to 16 before
moving this particular galpixel one pixel to the right.
Why not <#755#>pretend<#755#> that, on each count, the galpixel
``really <#756#>is<#756#>'' moving
1/16 of a pixel to the right---just that we don't actually see
it move because
our display isn't of a high enough resolution.
Rather, the galpixel says to itself on each count, ``Hmm, this display
device doesn't have any fractional positions; I'll just throw away the
fraction and stay here.''
But then, upon reaching the count of 16, the galpixel says, ``Hey, now
I'm supposed to be 16/16 pixels to the right---but that's one <#757#>whole<#757#>
pixel, and I can do that!''
Clearly, this is a better description for the counter than our original
one---we now know what to do if counting, say, by 3s---namely, we count
up
And so we come---by a rather roundabout route, to be sure---to the
conclusion that the <#758#>simplest<#758#> way to allow sub-pixel-per-frame
velocities is to ascribe to each galpixel two additional attributes:
a <#759#>fractional position<#759#> in each of the x and y directions.
The above roundabout explanation has, as a consolation prize, already
told us how many bits of fractional
positional information we require for a
There is, however, a slightly undesirable feature of the above
specification of the action of fractional position, that we must now
repair.
In the example given, the galpixel ``moved'' 1/16 of a pixel
each frame; on the 16th frame it moved to the right by one physical
display pixel.
Is this appropriate behaviour?
Consider the situation if the galpixel had in fact
been moving with an x-direction velocity of <#764#>minus<#764#> one-sixteenth
of a pixel per frame.
On the first frame, it would have <#765#>decremented<#765#> its fractional
position---initially zero---and ``reverse clocked'' back to a count of
15, simultaneously moving to the <#766#>left<#766#> by one physical display pixel.
But this is crazy---if it takes 16 frames to move one pixel right,
why does it only take one frame to move one pixel left, if it is supposed
to be moving at the <#767#>same speed<#767#> ( 1/16 pixels per frame)?
Clearly, we have been careless about our arithmetic: we have been
<#768#>truncating<#768#> the fractional part off when deciding where to put
the pixel on the physical display; we should have been <#769#>rounding<#769#>
the fraction off.
Implementing this repair, then, we deem that, if a fractional position
is greater than or equal to one-half, the physical position of the galpixel
in the galpixmap matrix is incremented, and the fractional part is
decremented by 1.0.
(Of course, there is no need to <#770#>actually<#770#> do any decrementing in the
fractional bits---one just proclaims that 10002 represents a fraction
-8/16; 10012 represents -7/16; and so on, up to 11112
representing -1/16.)
With this repair, a galpixel with a constant
speed of 1/16 pixels per frame
(and initial fractional position of zero, by definition!) will
take 8 frames to move one pixel if moving to the right, and 9 frames
(by reverse clocking over
There is one objection, however, that might be raised at this point:
we originally constructed our fractional position carefully, with the
correct number of bits, so that the galpixel would be able to wait up
to 16 frames before having to move one pixel.
Why have we now restricted this to only 8 (or 9) frames?
The answer is that we haven't, really; the motion still <#771#>is<#771#> at a speed
of 1/16 pixels per frame.
To see this, one need only continue the motion on for a longer time
period: the displayed pixel ``jumps'' at frames
We now turn to the question of determining how many bits of accuracy
are required for the velocity and acceleration components themselves,
for the example of a
Let us, therefore, examine this question a little more closely.
Consider, now, not the i-th galpixel travelling with constant
<#786#>velocity<#786#>,
but, rather, travelling under the effect of a constant
<#787#>acceleration<#787#>
What, then, does this requirement of 7 bits for
the fractional part of each component of
We must now consider the problem
of how we should add together the various differing-accuracy
numbers in <#807#>PropPos<#807#>:
We now turn the question of the accuracy required for the z-buffer
position, velocity and acceleration information.
Clearly, there is no advantage to thinking in terms of
``fractional'' bits
in the z direction <#816#>per se<#816#>---because
the visual information is not
matricised in that direction anyway.
Rather, one must simply allocate a sufficient number of
bits for the z-buffer
to ensure that the finest movement in this direction that the
application software requires
can be accurately <#817#>propagated<#817#>
over the rated propagation time
An interesting problem arises when an existing hardware z-buffer is in place which, for ing purposes, is not of a sufficiently high number of bits to meet the design specifications for the applications intended for use. In such a case, it <#820#>is<#820#> useful to think of adding ``fractional bits'' to the z-buffer; these extra bits are then stored in a new <#821#>physical<#821#> memory device, but in all respects are <#822#>logically<#822#> appended to the trailing end of the corresponding z-buffer values already implemented in hardware. The controlling software may then choose to either compute z values to the full accuracy of this extended z-buffer; or, for backwards compatibility with older applications, may choose to simply specify only integral z-buffer values, using the fractional bits purely to ensure rated performance under propagation.
We now turn to the question of how many <#823#>integral<#823#> bits are
required for the velocity
One must, however, also take in account the <#829#>acceleration<#829#>,
We can now start to plug in some typical, real-life figures for D
and
We have not, so far, considered the z buffer motional information.
As already noted, the z buffer comes in for different treatment,
because it is not matricised or displayed as the x and y components
are.
It <#870#>may<#870#> prove convenient, from a hardware design point of view,
to simply use the <#871#>same<#871#> motional structure for the z direction
as is used for the x and y directions.
However, in practice, if memory constraints are a concern,
one can allocate fewer bits to the z buffer motional data.
For example, for the 31-bit-per-component example given above (where
D = 1000 and
Thus, we find that, as a very rough estimate, we shall need about
12 bytes to store the motional and z-buffer data for each galpixel.
This may seem high; but at
We have, of course, not yet considered the <#877#>colour<#877#> or <#878#>shading<#878#>
information that must be stored with each galpixel.
Clearly, in time, this will be universally stored as 24-bit RGB colour
information, as display devices improve in their capabilities.
This is, of course, another three bytes of data per galpixel
that must be stored.
But the colour--shading question is, in fact, more subtle than this.
Consider what occurs as one walks past a wall in real life---which may,
say, be represented as a Gouraud-shaded polygon in the virtual
world: the light reflected from the
wall <#879#>changes in intensity<#879#> as we view it from successively
different angles.
Now, in the spirit of ing, we should really provide
a method for the video controller to keep up this ``colour-changing
inertia'' as time progresses,
until the next update arrives, so that our beautifully
rendered or textured objects do not suddenly change colours, in
a ``flashing'' manner, whenever a new display update arrives.
How, then, should we encode this information?
And how do we compute it in the first place?
The answer to the latter question is relatively simple,
at least in principle:
we only
need to compute the <#880#>time-derivatives<#880#> of the primitive-shading
algorithm we are applying, and program this information into the
display processor as well; colour temporal derivatives may then
be generated at scan-conversion time.
The former question, however---encoding the information
efficiently---is a little more subtle.
The approach would be to simply store the instantaneous
time derivatives of the red, green and blue components of the colour
data for each pixel.
However, this approach ignores the fact that, <#881#>most<#881#> of the
time, the change in colour of the object will be solely a change
in <#882#>intensity<#882#>---hue and saturation will stay relatively constant.
(Most violations
of this statement---such as a red LED changing to
green---are due to
artificial man-made objects anyway; our visual systems
do not respond well to such shifts, evolving as we have under the light
from single star.)
It is therefore prudent to encode our RGB information in such a way
that <#883#>intensity<#883#> derivatives can be given a relatively generous number
of bits (or, indeed, even second-derivative information),
whereas the remaining, hue- and saturation-changing pieces of information
can be allocated a very small number of bits.
On the other hand, this information must be regenerated, at video
scan-out speeds, into RGB information that can be added to the current
galpixel's colour values; we should not, therefore, harbour any
grand plans of implementing this encoding in a terribly clever,
but in practice unimplementable, way.
A good solution, with hardware in mind, would be to define three
new signals, A, B and C, related to the r, g and b signals
via the relations
<#884#>ABCFromRGB<#884#>
A=<#885#>1<#885#><#886#>3<#886#><#887#>r+g+b<#887#>,
B=<#888#>1<#888#><#889#>3<#889#><#890#>2r-g-b<#890#>,
C=<#891#>1<#891#><#892#>3<#892#><#893#>r-2g+b<#893#>.
Clearly, A represents the intensity of the pixel in this coding
scheme: a generous number of bits may be allocated to storing
dA/dt---and maybe even a few for
We have now outlined, in broad terms, most of the hardware additions and modifications necessary to retrofit an existing system with a ing display system. There are, however, two final, important questions that must be addressed, that are slightly more algorithmic in nature: What happens when two galpixels are propagated to the <#896#>same<#896#> pixel in the new frame? And what happens if a pixel in the new frame is not occupied by <#897#>any<#897#> galpixel propagated from the previous frame? There must, of course, be answers supplied to these questions for any ing system to work at all; however, the <#898#>best<#898#> answers depend on both how much ``intelligence'' can be reliably crammed into a high-speed video controller, as well as the nature of the images are being displayed. In section~<#899#>LocalUpdate<#899#>, suggested changes to image generation philosophy will subtly shift our viewpoint on this question yet again; however, general considerations, at least as far as applications are concerned, will remain roughly the same.
We first consider the question of <#900#>galpixel clash<#900#>: when two galpixels are propagated forward to the same physical display pixel. Clearly, the video controller must know which galpixel should ``win'' in such a situation. This is the reason that we earlier insisted that hardware z-buffering <#901#>must<#901#> be employed: without this information, the video controller would be left with a dilemma. <#902#>With<#902#> z-buffer information, on the other hand, the video controller's task is simple: just employ the standard z-buffering algorithm.
Having dealt so effortlessly with our first question, let us now
turn to the second: what happens when there is an unoccupied galpixel
in the new frame buffer?
It might appear that the video controller cannot do <#903#>anything<#903#>
in such a situation: there simply is not enough information in
the galpixmap; whatever <#904#>should<#904#> now be in view must have been
obscured at the time of the last update.
While this is indeed true when the unoccupied pixel <#905#>does<#905#>,
in fact, correspond to an ``unobscured'' object---and will,
of course, require
some sort of acceptable treatment---it is <#906#>not<#906#> the only
situation in which empty pixels can arise.
Firstly, consider the finite nature of our arithmetic: it may well be
that a particular galpixel just happens to ``slip'' onto its
neighbour; where there were before two galpixels, there is now
only one.
This ``galpixel fighting'' leads to a mild attrition in the number
of galpixels as the video controller propagates from frame to frame;
this is not a serious problem, but it must
nevertheless be kept in mind.
Secondly, and more importantly, it must be remembered that we
are here rendering true, three-dimensional perspective images---<#907#>not<#907#>
simply 2
How, then, are we to treat this latter case? Clearly, the best idea would be to simply ``fill in'' the holes, with some smooth interpolation of colour, so that we get some sort of consistent ``magnification'' of the (admittedly still low-resolution) object that is approaching the observer. But before we implement such a strategy, it is necessary to note that <#914#>we must have some way of distinguishing unobscuration and expansion<#914#>. Why? Because, in general, we would like to apply a different solution to each of these two cases. Is it not possible apply one general fix-all solution? After all, we have a fundamental deficit of information in the unobscuration case anyway---why not just use the expansion algorithm there too so that at least we have <#915#>something<#915#> to show? This sounds reasonable, and, indeed, might be the wisest course of attack in highly constrained retrofit situations. However, in general, we can do much better than this, for essentially no extra effort; we shall now outline these procedures.
We shall first take a nod at history, and consider the case of a <#916#>wire-frame<#916#> display system. While destined to slip nostalgically into the long-term memories of workers in the field of computer graphics, and will only be able to be seen at all by the year 2001 by watching Stanley Kubrick's masterpiece of the same name, wire-frame graphics nevertheless provides a simple proof-of-concept test-bed for prototyping ed displays, and is so simple to implement that it is worth including here. In a wire-frame display system, the vast majority of the display is covered by a suitable background colour (black, or dark blue); on top of this background, lines are drawn to represent the edges of objects. That such graphics can look realistic at all is a testament to our edge-detection and pattern-recognition visual senses: quite literally, almost of the visual information in the scene has been removed. This removal of information, however, is what makes unoccupied-pixel regeneration so simple in wire-frame graphics: the best solution is to simply <#917#>display the background colour<#917#> for such pixels. For sure, a part of a line might disappear if it happens to cide with one that is closer to the observer in any given frame (although this problem can in fact be removed by using the enhancements of section~<#918#>LocalUpdate<#918#>); however, we assume that the rendering system is not too sluggish about provide updates anyway: the piece of line will only be AWOL for a short time. Overall, the accurate portrayal of <#919#>motion<#919#> far outweighs, in psychological terms, any problems to do with disappearing bits of lines.
Let us now turn to the case of <#920#>shaded<#920#>-image systems.
Will the approach taken for the wire-frame display system work
now?
It is clear that it will not: in general, there is no such thing as
a ``background'' colour: each primitive must be ``coloured in''
appropriately.
Thus, <#921#>even for the unobscuration case<#921#>, we must come up with
some reasonable image to put in the empty space: painting it
with black or dark blue would, in most situations, look simply shocking.
The solution that we propose is the following: <#922#>if a portion
of the display is suddenly unobscured, just leave
whatever image was there last frame<#922#>.
Why should we do this?
Well, for starters, we have nothing better to paint there.
Secondly, if the display update is not too long in coming, this
trick will tend to mimic CRT phosphor persistence: it looks as if
that part of the image just took a little bit longer to decay away.
This ``no-change'' approach thus fools the viewer into thinking
nothing too bad as happened, since the fact that that part of the
display ``does not look up-to-date'' will not truly register,
consciously, for (say) five or ten frames anyway.
Thirdly, and most revealingly, <#1671#>this approach yields no worse
a result than is already true for conventional
sample-and-hold
Of course, this explanation is a tad more mischievous than
it appears: mixing
We now consider in more detail how this mixing of
The next task for the video controller is to simply perform the
At the end of this two-pass process, the new frame buffer will contain as much true ed information as possible; the remaining unoccupied galpixels simply let the debris ``show through''. (Of course, we have not yet considered the surfaces that are ``expanding''; this will come shortly.)
The prospective -ed display designer might, by this point,
be worrying about the fact that we have introduced a <#950#>two-pass<#950#>
process into the video controller's propagation procedure:
nanoseconds are going to be tight anyway;
do we really need to double the procedure just for the sake
a copying across debris?
Fortunately, the practical procedure need only be a <#951#>one<#951#>-pass
one, when done carefully, as follows:
Firstly, the new frame buffer must have all of its debris
indicators set to true; nothing else need be cleared.
This will be especially easy when the debris indicator is a
dedicated set of debris flags that are all located on the same
memory chip: flags can be hardware-set <#952#>en masse<#952#>.
Next, the video controller scans through the old frame buffer, galpixel
by galpixel, retrieving information from each in turn.
Two parallel arms of the video controller's circuitry now
come into play.
The first proceeds to check
to see whether the <#953#>debris<#953#> from that particular
galpixel may be copied across:
it checks the corresponding galpixel in the new frame buffer; if it
is debris, it overwrites it (because debris is copied across one-to-one
anyway, so that the debris that is there
must be simply the cleared-buffer message);
if it is a non-debris galpixel, it leaves it alone.
Simultaneously, the second parallel arm of the video controller circuitry
propagates the galpixel according to
As an aside, it is worth noting, at this point, that the prospects for implementing significant parallelism in the video controller circuitry <#957#>in general<#957#> are reasonably good, if a designer so wishes to---or needs to---proceed in that direction. One approach would be to slice the display up into smaller rectangular portions, and assign dedicated controller circuitry to each of these portions. Of course, such implementations need to correctly treat memory contention issues, since a galpixel in any part of the display can, in general, be propagated to any other part of the display by the next frame. The extent to which this kneecaps the parallel approach depends intimately on the nature of the memory hardware structure that is implemented. Furthermore, the video controller is subject to a rock-solid deadline: the new frame buffer <#958#>must<#958#> be completely updated before the next tick of the frame clock; this further complicates the design of a reliable parallel system. However, this is a line of research that may prove promising.
We now return to the problem of empty-pixel regeneration. We have, above, outlined the general plan of attack in <#959#>unobscuration<#959#> situations. But what about <#960#>expansion<#960#> situations? We should first check to see if using debris solves this problem already. Imagine an object undergoing (apparent) expansion on the display. Just say a pixel near the centre becomes a ``hole''. What will the debris method do? Well, it will simply fill it in with the <#961#>old<#961#> central pixel---which looks like a reasonably good match! But now consider a hole that appears near the boundary of the expanding object. The debris method will now fill in whatever was <#962#>behind<#962#> that position in the previous frame---which could be any other object. Not so good. Now consider an even worse situation: the object is apparently moving towards the viewer <#963#>as well as moving transversely<#963#>: in the new frame, it occupies a new position, but its own debris is left behind. Again, objects behind the one approaching will ``show through'' the holes. Not very satisfactory at all.
We must, therefore, devise some way in which the video controller can tell whether a missing pixel is due to unobscuration, of whether it is due to expansion of an object. But this test <#964#>must be lightning-fast<#964#>: it must essentially be hard-wired to be of any use in the high-speed frame buffer propagation process. Undaunted, however, let us consider, from first principles, how this test might be carried out. Firstly, even in abstract terms, how would one distinguish between unobscuration and expansion anyway? Well, we might look at the <#965#>surrounding galpixels<#965#>, and see what patterns they form. How does this help? Well, in an <#966#>expansion<#966#> situation, the surrounding pixels will all lie on a three-dimensional, roughly smooth surface. On the other hand, on the edge of a patch of <#967#>unobscured<#967#> galpixels, the surrounding pixels will be discontinuous, when looked at in three-space: the object moving in front of the one behind must be at least some <#968#>measurable<#968#> z-distance in front of the other (or else z-buffering would be screwed up anyway). So, in abstract terms, our task is to examine the surrounding pixels, and see if they all lie on the same surface.
How is this to be done in practice, much less in hardware?
To investigate this question, we first note that we are only interested
in a <#969#>small area<#969#> of the surface; in such a situation, one may
approximate the surface as <#970#>approximately flat<#970#>, whether it really
<#971#>is<#971#> flat ( a polygon) or not ( a sphere).
(If the surface is not really flat, and only subtends a few pixels'
worth on the display, in which area the curvature of the surface is
significant, then this argument breaks down; the following
algorithm will mistakenly think unobscuration is happening; but, in this
situation, such a small rendering of
the object is pretty well unrecognisable anyway,
and a hole through it is probably not going to bring the world to
an end.)
Now, we can always approximate a sufficiently flat surface by a
<#972#>plane<#972#>, at least over the section of it that we are interested in.
Consider, now, an <#973#>arbitrary<#973#>
surface in three-space, which we assume
can be expressed in the form z = z(x, y), the z component at
any point is specified according to some definite
single-valued function of x
and y.
(Our display methodology automatically ensures only single-valued
functions occur anyway, due to the hidden-surface removal property
of z-buffering.)
Now, if that surface is a <#974#>plane<#974#>,
z will (by definition) be simply a linear function of x and
y:
<#975#>PlaneDef<#975#>
z(x,y)=x+y+,
where
``OK, then,'' our recently-nauseated readers ask, ``How on earth
do we compute Poission's equation, <#1047#>Poisson<#1047#>, in our
video controllers?
What sort of complicated, slow, expensive,
floating-point chip will we need for
<#1048#>that<#1048#>?''
The answer is, of course, that computing <#1049#>Poisson<#1049#> is
<#1050#>especially<#1050#> easy if we have z-buffer information on a rectangular
grid (which is precisely what we <#1051#>have<#1051#> got!).
Why is this so?
Well, considered the Laplacian operator
OK, then, we have shown that computing the Poisson test for any
pixel is something that we can definitely do in hardware.
How do we use this information,
in practice, to perform an ``expansion'' of the image of
an object moving towards us?
Well, this is an algorithmically straightforward task;
it does, however, require some extra memory on the part of the
video controller, as well as sufficient speed to be able to perform
(yes, it must be admitted)
<#1120#>two<#1120#> sweeps: once through the old frame buffer (as described above),
and once to go through the new frame buffer.
The general plan of attack is as follows.
The first sweep, through the old frame buffer, follows the
procedure outlined above: both debris and propagated galpixels are
moved to the new frame buffer.
To this first sweep, however, we add yet
<#1121#>another<#1121#> piece of parallel circuitry
(in addition to two circuits already there---that copy debris and
propagate galpixels respectively), that computes the Laplacian of
z-buffer information according to the approximation
<#1122#>PoissonFinal<#1122#>, as each galpixel is encountered in turn.
For each galpixel, it determines whether the Poisson test is satisfied
or not, and stores this information in a <#1123#>Poisson flag map<#1123#>.
This 1-bit-deep matrix simply stores ``yes'' or ``no'' information
for each galpixel: ``yes'' if there is a non-zero Poisson source,
``no'' if the computed source term is compatible with zero.
This Poisson flag map is only a modest increase in memory requirements
(only 120~kB for even a
Once the first sweep has been completed (and not before), a second sweep is made, this time of the <#1129#>new<#1129#> frame buffer. At the beginning of this sweep, the new frame buffer consists solely of propagated galpixels and, where they are absent, debris; and the Poisson flag map contains the ``yes--no'' Poisson test information about every galpixel in the <#1130#>old<#1130#> frame buffer. The video controller now scans through the new frame buffer, looking for any pixels marked as debris. When it finds one, it then looks at the <#1131#>previous<#1131#> galpixel the new frame buffer (, with a regular left-to-right, top-to-bottom scan, the pixel immediately to the left of the debris). Why? The reasoning is a little roundabout. First, assume that the unoccupied pixel of the new frame buffer <#1132#>is<#1132#>, in fact, within the boundaries of a smooth surface. If that <#1133#>is<#1133#> true, then the galpixel to its left must be inside the surface too. If we then <#1134#>reverse-propagate<#1134#> that galpixel (the one directly to the left) back to where it <#1135#>originally<#1135#> came from in the old frame buffer (as, of course, we <#1136#>can<#1136#> do---simply reversing the sign of t in the propagation equation, using the same hardware as before), then we can simply check its Poisson flag map entry to see if, in fact, it <#1137#>is<#1137#> within the boundaries of a surface. If it is, then all our assumptions are in fact true; the debris pixel in the new frame buffer is within the surface; we should now invoke our ``expansion'' algorithm (to be outlined shortly). On the other hand, if the Poisson flag map entry turns out to show that the reverse-propagated galpixel is <#1138#>not<#1138#> within the boundaries of a surface, then our original assumption is therefore false: we have (as best as we can determine) a <#1139#>just-unobscured<#1139#> pixel; for such a pixel, debris should simply be displayed---but since there already <#1140#>is<#1140#> debris there, we need actually do nothing more.
However, we must, in practice, add some ``meat'' to the above algorithm to make it a little more reliable. What happens when the galpixel to the left was <#1141#>itself<#1141#> on the left edge of the surface? In that situation, we <#1142#>do<#1142#> want to fill in the empty pixel as an ``expansion''---but the Poisson test tells us (rightly, of course) that the pixel to the left is on an edge of something. In an ideal world, we could always ``count up'' the number of edges we encounter to figure out if we are entering or leaving a given expanding surface. But, even apart from the need to treat special cases in mathematical space, such a technique is not suitable at all in the approximate, noisy environment of a pixel-resolution <#1143#>numerical<#1143#> Poisson test. One solution to the problem is to leave such cases as simply not correctly done: holes may appear near the edges of approaching objects; we have, at least, covered the vast majority of the interior of the surface. A better approach, if at least one extra memory fetch cycle from the old frame buffer is possible (and, perhaps thinking wishfully, up to three extra fetch cycles), is to examine not only the galpixel in the new frame buffer to the <#1144#>left<#1144#> of the piece of debris found, but also the ones <#1145#>above<#1145#>, to the <#1146#>right<#1146#>, and <#1147#>below<#1147#>. If <#1148#>any<#1148#> of these galpixels are in a surface's interior, then the current pixel probably is too.
On the other hand, what is our procedure if <#1149#>all<#1149#> of the surrounding galpixels that we have time to fetch are debris? Then it means that, most likely, we are in the interior of a <#1150#>just-unobscured<#1150#> region of the new frame buffer, and we should leave the debris in the current pixel there. This is how the algorithm ``bootstraps'' its way up from purely edge-detection to actually filling in the <#1151#>interior<#1151#> of just-unobscured areas. In any case, the video controller, in such a case, would have no information to change the pixel in question anyway---the correct answer is forced on whether we like it or not!
Let us now return to the question of what our ``expansion algorithm'' should be, for pixels that we decide, by the above method, <#1152#>should<#1152#> be ``blended'' into the surrounding surface. There are many fancy algorithms that one might cook up to ``interpolate'' between points that have expanded; however, we are faced with the dual problems that we have very little time to perform such feats, and we do not really know for sure (without some complicated arithmetic) just where the new pixel <#1153#>would<#1153#> have been in the old frame buffer. Therefore, the simplest thing to do is simply to replicate the galpixel to the left---colour, derivatives, velocity, and all---except that its <#1154#>fractional position<#1154#> should be zeroed (to centre it on its new home). If we <#1155#>did<#1155#>, perchance, fetch <#1156#>two<#1156#> surrounding galpixels---say, the ones to the left and to the right---then a simple improvement is to <#1157#>average<#1157#> each component of RGB (which only requires an addition, and a division by half--- a shift right). Finally, if we have really been luxurious, and have fetched <#1158#>all four<#1158#> immediately-surrounding pixels, then again we can perform a simple averaging, by using a four-way adder, and shifting two bits to the right. Considering that it is a fairly subtle trick to ``expand'' surfaces in the first place, and that a display update will soon be needed to fill in more detail for this object approaching the viewer in any case, any of these interpolative shading approaches should be sufficient in practice---even if, from a computer graphics point of view, they are fairly primitive.
Now, the complete explanation above of this ``second frame sweep'' seems to involve a fair bit of ``thinking'' on the part of the video controller: it has to look for debris; <#1159#>if<#1159#> it finds it, it then has to fetch the galpixel to the left; it then reverse-propagates it back to the old frame buffer; it then fetches its Poission flag map entry; it then decides whether to replicate the galpixel to the left (if in a surface) or simply leave debris (if representing unobscuration); all of these steps seem to add up to a pretty complicated procedure. What if it had to do this for almost every galpixel in the new frame buffer? Can it get it done in time? The mode of description used above, however, was presented from a ``microprocessor'', sequential point of view; on the other hand, all of the operations described are really very trivial memory fetch, add, subtract, and compare operations, which can be hard-wired trivially. Thus, <#1160#>all<#1160#> of the above steps should be done <#1161#>in parallel<#1161#>, for <#1162#>every<#1162#> galpixel in the second sweep, <#1163#>regardless<#1163#> of whether it is debris or not: the operations are so fast that it will not slow anything down to have all that information ready in essentially one ``tick of the clock'' anyway. Thus, the only overhead with this process is that one needs to have time to perform both the first and second sweeps---which, it should be noted, <#1164#>cannot<#1164#> be done in parallel.
There is, however, a more subtle speed problem that is somewhat hidden in the above description of the Poisson test. This speed problem is (perhaps surprisingly) not in the second sweep at all---but, rather, in the <#1165#>first<#1165#>. The problem arises when we are trying to compute the Poisson test, according to <#1166#>PoissonFinal<#1166#>, for each galpixel: this requires fetching <#1167#>five<#1167#> entries of z-buffer information at the same time. ``But,'' the reader asks, ``you have described many times procedures that require the simultaneous fetching of several pieces of memory. Why is it only now a problem?'' The subtle answer is that, in those other cases, the various pieces of information that needed to be simultaneously fetched were always <#1168#>different<#1168#> in nature. For example, to compute the propagation equation <#1169#>PropPos<#1169#>, we must fetch all three components of motional information simultaneously. The point is that <#1170#>these pieces of information can quite naturally be placed on different physical RAM chips<#1170#>---which can, of course, all be accessed at the same time. On the other hand, the five pieces of z-buffer information that we require to compute <#1171#>PoissonFinal<#1171#> will most efficiently be stored on the <#1172#>one<#1172#> set of RAM chips---which requires that <#1173#>five separate memory-fetch cycles<#1173#> be initiated (or, at the very least, something above three fetch cycles, if burst mode is used to retrieve the three value entries on the same row as the pixel being tested). This problem would, if left unchecked, ultimately slow down the whole first-sweep process unacceptably.
The answer to this problem is to have
an extra five small memory chips, each with
just enough memory to store a <#1174#>single row<#1174#> of z-buffer information.
(In fact, to store the possible results from equation~<#1175#>PoissonFinal<#1175#>,
they need three more bits per pixel than the z-buffer contains,
because the overall computation of Poisson's equation has a
range
It now simply remains to be specified how the Poisson finger information is used. At any given pixel being processed during the first sweep, the information necessary to finish the computation of the Poisson test for the galpixel <#1181#>directly above<#1181#> the current pixel becomes available. In our previous description, we simply said that this Poisson finger is <#1182#>written<#1182#> to, like the other three surrounding pixels; in fact, all that is necessary is that it be <#1183#>read<#1183#> (and not rewritten), added to this final piece of z-buffer information, and passed through to the Poisson test checker (the circuitry that determines whether the source term in the Poisson equation is close enough to zero to be zero or not).
At the end of scanning each row in the first sweep, the video controller circuitry must multiplex the contents of the three centre-row Poisson fingers back into a single logical finger, and ``scroll'' them up to the top finger; likewise, the finger below must be demultiplexed into three sets, to be placed in the three central-row fingers; and the finger below must in turn be zeroed, ready to start a new row. These memory-copy operations can be performed in burst mode, and only need be done at the end of each scan-row. Alternatively, for even better performance, <#1184#>nine<#1184#> Poisson fingers may be employed, with three each for the row being scanned and the rows directly above and below; the chip-addressing multiplexers can then simply be permuted at the end of each scan-row, so that no bulk copying need take place at all. The only remaining task is then to <#1185#>zero<#1185#> the set of fingers corresponding to the row that is becoming the new bottom row; the chosen memory chip should have a line that clears all cells simultaneously. Of course, with the nine-finger option, each chip need only store the information for a third of a row, since it will always be allocated to the same position in the (0, 1, 2) cyclic sequence.
We have now completed our considerations on what hardware additions
and modifications are required for a minimal implementation
of
These properties of the
<#1190#>MinSoftwareMods<#1190#><#1191#>Software Modifications and Additions<#1191#> In the previous section, the minimal additions to hardware necessary to implement ing immediately, on an existing system, were described. In this section, we consider, in a briefer form, the software changes that are needed to drive such a system. Detailed formulas will, in the interests of space, not be listed; but, in any case, they are simple to derive from first principles, following the guidelines that will be given here.
Our first topic of consideration must be the <#1192#>display processor<#1192#>,
which has been treated in fairly nebulous way up to now.
The display processor must, at the very least, scan-convert all
of the appropriately clipped, transformed primitives passed to it
from further down the graphical pipeline.
In
The first additional task, that of interpolating velocity and acceleration information, is particularly simple to implement if the logical display device is a planar viewport on the virtual world, and if all of the primitives to be rendered are planar (polygons, lines, ). In such systems, the linear nature of the perspective transformation means that lines in physical space correspond to lines in display space, and, as a consequence, planar polygons in physical space correspond to planar polygons in display space. But if we take successive time derivatives of these statements, we find that the velocities, accelerations, , of the various points on a line, or the surface of a polygon, are simply <#1194#>linear interpolations<#1194#> of the velocities of the vertices of the line or polygon in question. Thus, the same algorithms that are used to linearly interpolate z-buffer values can be used for the velocity and acceleration data also; we simply need a few more bits of hardware to carry this task out.
Computing the time derivatives of the colour data, on the other hand, is a little more complicated. In general, one must simply write down the complete set of fundamental equations for the illumation model and shading model of choice, starting from the fundamental physical inputs of the virtual world (such as the position, angle, brightness, of each light source; the direction of surface normals of the polygons and the positions of their vertices; and so on), and proceeding right through to the ultimate equations that yield the red, green and blue colour components that are to be applied to each single pixel. One must then carefully take the time-derivative of this complete set of equations, to compute the first derivative of the red, green and blue components in terms of the fundamental physical quantities (such as light source positions, ), and their derivatives. This is easier said than done; the resulting equations can look very nasty indeed. So far, however, the described procedure has been purely mathematical: no brains at all needed there, just a sufficiently patient computer (whether human or machine) to compute derivatives. The real intelligence comes in distilling, from this jungle of equations, those relations that have the most effect, <#1195#>psychologically<#1195#>, on the perceived temporal smoothness of the colour information presented in a real-life session. There are innumerable questions that need to be investigated in this regard; for example, if our shading model interpolates intensities across a surface, should we take the time derivative of the interpolation, or should we interpolate the time derivatives? The former approach will clearly lead to a greater <#1196#>consistency<#1196#> between updates, but the latter is almost certainly easier to compute in real life. Such questions can only be answered with a sufficient amount of focused, quality research. The author will not illustrate his ignorance of this research any further, but will rather leave the field to the experts.
A more subtle problem arises when we consider <#1197#>optically sophisticated models<#1197#>. By this term, we mean the use of a rendering model that simulates, to a greater or lesser extent, some of the more subtle aspects of optics in real life: the use of shadows; the portrayal of transparent and translucent objects; the use of ray-tracing. While these techniques are (for want of gigaflops) only modelled crudely, if at all, in current systems, there is nevertheless a more or less unstated feeling that these things will be catered for in the future, when processing power increases. The author, however, has serious doubts about this goal. For sure, spectacular simulations of the real world are vitally important proof-of-concept tools in conventional computer graphics; and there are a large number of potentially powerful applications that can make excellent use of such capabilities. These applications---and, indeed, new ones---will also be, in time, feasible targets for the industry: simulating the real world will be a powerful domestic and commercial tool. But will be a poor field indeed if it is limited to simply simluating with ever-increasing levels of realism. Rather, it will be the imaginative and efficient programmes of <#1198#>virtual world design<#1198#> that will be the true cornerstone of the industry in years to come. Do not replicate the mistakes of Pen Computing---otherwise known as How You Can Build a $5000 Electronic Gadget to Simulate a $1 Notepad--- will be very interested.
To illustrate what a mistake it would be to pursue simulation to the exclusion of everything else, it is sufficient to note that even the field of ``sexy graphics'' (that can produce images of teddy-bears and office interiors practically indistinguishable from the real thing) is usually only equipped to deal with a very small subset of the physical properties of the real world. Consider, as an example, what happens if one shines a green light off a large, flat, reflecting object that is coming towards you. What do you see? Obviously, a green light. What happens if the object is moving towards you very fast? So what; it still looks green. But what if it is moving <#1199#>really<#1199#> fast, from a universal point of view? What then? Ah, well, then the real world diverges from computer graphics. In the real world, the object looks blue; on the screen, however, it still looks green.
``But that's cheating,'' a computer graphics connoisseur argues, ``Who cares about the Doppler effect for light in real life?'' Excluding the fact that thousands of cosmologists would suddenly jump up and down shouting ``Me! Me!'', just imagine that we <#1200#>do<#1200#> want to worry about it: what then? Well, perhaps in the days of punch cards and computational High Priests, the answer would have been, ``Too bad. Stop mucking around. We only provide simulations here.'' But this attitude will not be tolerated for a minute in today's ``computationally liberated'' world.
Of course, the algorithms for ray-tracing may be modified, quite trivially, to take the Doppler effect into account. But what if we now wanted to look at a lensed image of a distant quasar (still a real-world situation here, not a virtual one); what then? Ah, well, yes, well we'd have to program General Relativity in as well. Well what about interference effects? Er..., OK, I think I can program in continuous fields. The photoelectric effect? Optical fibres? Umm..., yes, well..., well what on Earth do you want all this junk for anyway?!
The point, of course, is that, on Earth, we don't. But that doesn't mean that people want their brand-new fancy-pants ray-traced system to be the late-1990s version of a Pen Computer! If someone wants to view a virtual world in the infra-red, or ultra-violet, or from the point of view of high-energy gamma rays for that matter, why stop them? What a system must do, as best it can, is stimulate our senses in ways that we can comprehend, <#1201#>but for the sole purpose of being information inputs to our brains<#1201#>. If we want to simulate the ray-traced, sun-lit, ultra-low-gravity, ultra-low-velocity, ultra-low-temperature world that is our home on the third planet around an average-sized star, two-thirds of the way out of a pretty ordinary-looking galaxy, then that is fine. Such applications will, in many cases, be enormously beneficial (and profitable) to the participants. But it is a very small fragment of what <#1202#>can<#1202#>---and will---be done.
So where is all of this meaningless banter leading? The basic point is this: already, with wire-frame graphics, the early systems were able to give a surprisingly good feeling of ``presence'' in the virtual world. Flat-shaded polygons make this world a lot ``meatier''. Interpolative shading, curved surfaces and textures make the virtual world look just that little bit nicer. However, we are rapidly reaching the saturation limit of how much information can be absorbed, via these particular senses, by our minds; making the <#1203#>visible<#1203#> virtual world even more ``realistic'' will not lead, in itself, to much more of an appreciation of the information we are trying to understand with the technology (although history already has places reserved for the many advances in stimulating our <#1204#>other<#1204#> senses that will heighten the experience---aural presence being excitingly close to being commercially viable; force and touch being investigated vigorously; smell and taste being in the exploratory phases). When you walk around a () office, for example, do you have to stop and stare in awe because someone turns another light on? Can you still recognise your car if it is parked in the shade? Does the world look hallucinogenically different under fluorescent lighting than under incandescent lighting? The simple fact is that <#1205#>our brains have evolved to normalise out much of this lighting information as extraneous<#1205#>; it makes sexy-looking demos, but so do Pen Computers.
It may be argued, by aficionados of the field of optically sophisticated modelling, that the author's opinions on this topic are merely sour grapes: machines can't do it in real-time right now, so I don't want it anyway. This could not be further from the truth: ideas on how these sophisticated approaches might be reconciled with the techniques outlined in this are already in the germination stage; but they are of a purely speculative nature, and will remain so until hardware systems capable of generating these effects in practical virtual-world-application situations become widespread. Implementing optically sophisticated techniques in will, indeed, be an important commericial application for the technology in years to come. But this is all it will be: <#1206#>an<#1206#> application, not the whole field. It will not be part of general-purpose hardware; it will be a special-purpose, but lucrative, niche market. It will be a subdiscipline.
All that remains is for consumers to decide What on Virtual Earth they want to do with the rest of their new-found power.
<#1207#>Enhancements<#1207#><#1208#>Advanced Enhancements<#1208#> The previous section of this outlined how a minimal modification of an exisiting system might be made to incorporate ing. In this section, several more advanced topics, not required for such a minimal implementation, are contemplated. Firstly, a slight interlude away from the rigours of ing is taken: in section~<#1209#>WrapAround<#1209#>, a simple-to-follow review of the fundamental problems facing the designers of <#1210#>wrap-around head-mounted display systems<#1210#> is offered; the technical material offered, however, is not in any way new. After this respite, section~<#1211#>LocalUpdate<#1211#> renews the charge on ing proper, and suggests further ways in which systems can be optimised to present visual information to the participant in the most psychologically convincing way that technological constraints will allow.
<#1212#>WrapAround<#1212#><#1213#>Wrap-Around Head-Mounted Display Design<#1213#> In conventional computing environments, the display device is generally flat and rectangular, or, in light of technological constraints on CRT devices, as close as possible to this geometry as can be attained. The display device is considered to be a ``window on the virtual world'': it is a planar rectangular viewport embedded in the real world that the viewer can, in effect, ``look through'', much as one looks out a regular glass window at the scenery (or lack of) outside.
In a similar way, most of the mass-marketed <#1214#>audio<#1214#> reproduction equipment in use around the world aims to simply offer the aural equivalent of a window: a number of ``speakers'' (usually two, at the present time) reproduce the sounds that the author of the audio signal has created; but, with the exception of quadrophonic systems, and the relatively recent Dolby Surround Sound, have not attempted to portray to the listener any sense of <#1215#>participation<#1215#>: the listener is (quite literally) simply a part of the <#1216#>audience<#1216#>---even if the reproductive qualities of the system are of such a high standard that the listener may well believe, with eyes closed, that they are actually sitting in the audience of a live performer.
Linking these two technologies together, ``multimedia'' computing---which has only exploded commercially in the past twelve months---heightens the effectiveness of the ``window on a virtual world'' experience greatly, drawing together our two most informationally-important physical senses into a cohesive whole. High-fidelity television and video, in a similar way, present a ``read-only'' version of such experiences.
, on the other hand, has as its primary goal <#1217#>the removal of the window-frame, the walls, the ceiling, the floor<#1217#>, that separate the virtual world from the real world. While the prospect of the demise of the virtual window might be slightly saddening to Microsoft Corporation (who will, however, no doubt release Microsoft Worlds 1.0 in due time), it is an extremely exciting prospect for the average man in the street. Ever since Edison's invention of the gramophone just over a century ago, the general public has been accustomed to being ``listeners''---then, later, ``viewers'' (<#1680#>à~la<#1680#> Paul Hogan's immortal greeting, ``G'day viewers'')---and finally, in the computer age, ``operators''; now, for the first time in history, they have the opportunity of being completely-immersed <#1219#>participants<#1219#> in their virtual world, with the ability to mould and shape it to suit their own tastes, their own preferences, their own idiosyncracies.
Attaining this freedom, however, requires that the participant is convinced that they <#1220#>are<#1220#>, truly, immersed in the virtual world, and that they have the power to mould it. There are many, many challenging problems for designers to consider to ensure that this goal is in fact fulfilled---many of them still not resolved satisfactorily, or in some cases not at all. In this , however, we are concerned primarily with the visual senses, to which we shall restrict our attention. Already, in sections~<#1221#>BasicPhilosophy<#1221#> and~<#1222#>MinimalImplementation<#1222#>, we have considered how best to match computer-generated images to our visual motion-detection senses. However, we have, to date, assumed that the ``window'' philosophy underlying traditional computer graphics is still appropriate. It is this unstated assumption that we shall now investigate more critically.
Traditionally, in computer graphics, one sits about half a metre
in front of a rectangular graphics display of some kind.
The controlling
software generates images that are either inherently 2-dimensional
in nature; are ``2
The reader may, by now, be wondered why the author is pushing this point so strongly---after all, aren't displays just like traditional computer graphics displays? Well, as has already been shown in sections~<#1223#>BasicPhilosophy<#1223#> and~<#1224#>MinimalImplementation<#1224#>, this is not the case: new problems require new solutions; and, conversely, old dogs are not always even <#1225#>interested<#1225#> in new tricks (as the millions of still-contented DOS users can testify). But more than this, there is, <#1226#>and always will be<#1226#>, a fundamental difference between the field of <#1227#>planar<#1227#> Computer Graphics, and the subdiscipline of the field of that will deal with visual displays: namely, <#1228#>the human visual system is intrinsically curved<#1228#>: even without moving our heads, each of our eyes can view almost a complete hemisphere of solid angle. Now, for the purposes of traditional computer graphics, this observation is irrelevant: the display device is <#1229#>itself<#1229#> flat; thus, images <#1230#>must<#1230#> be computed as if projected onto a planar viewing plane. Whether the display itself is large or small, high-resolution or low, it is always a planar ``window'' for the viewer. But the aims of are completely different: we wish to <#1231#>immerse<#1231#> the participant, as completely as technically possible, in the virtual world. Now our terminology subtly shifts: since we ideally want to cover each of the participant's eye's <#1232#>entire<#1232#> almost-hemispherical field of view, the <#1233#>size<#1233#> and <#1234#>physical viewing distance<#1234#> of the display are irrelevant: all we care about are <#1235#>solid angles<#1235#>---not ``diagonal inches'', not ``pixels per inch'', but rather <#1236#>steradians<#1236#>, and <#1237#>pixels per radian<#1237#>. We are working in a new world; we are subject to new constraints; we have a gaggle of new problems---and, currently, only a handful of old solutions.
``Surely,'' a sceptic might ask, ``can't one always cover one's field of view using a planar display, by placing the eye sufficiently `close' to it, and using optical devices to make the surface focusable?'' The theoretical answer to this question is, of course, in the affirmative: since the angular range viewable by the human eye is less than 180~degrees in any direction (a violation of which would require considerable renovations to the human anatomy), it <#1238#>is<#1238#>, indeed, always possible to place a plane of finite area in front of the eye such that it covers the entire field of view. However, theoretical answers are not worth their salt in the real world; our ultimate goal in designing a system is to <#1239#>maximise the convinceability<#1239#> of the virtual-world experience, given the hardware and software resources that are available to us.
How well, then, does the planar display configuration (as suggested by our sceptic above) perform in real life? To answer this question at all, quantitatively, requires consideration of the <#1240#>human<#1240#> side of the equation: What optical information is registrable by our eyes? The capabilities of the human visual system have, in fact, been investigated in meticulous detail; we may call on this research to give us precise quantitative answers to almost any question that we might wish to ask. It will, however, prove sufficient for our purposes to consider only a terribly simplistic subset of this information---not, it is hoped, offending too greatly those who have made such topics of research their life's work---to get a reasonable ``feel'' for what we must take into account. In order to do so, however, we shall need to have an accurate way of portraying <#1241#>solid angles<#1241#>. Unfortunately, it is intrinsically impossible to represent solid angles without distortion on a flat sheet of paper, much less by describing it in words. Seeing as that we are still far from having `` on every desk'', it is also not currently possible to use the technology itself to portray this information. The reader is therefore asked to procure the following pieces of hardware so that a mathematical construction may be carried out: a mathematical compass---preferably one with a device that locks the arms after being set in position; one red and one blue ballpoint pen, that both fit in the compass; a texta (thick-tipped fibre marker), or a whiteboard marker; a ruler, marked in millimetres (or, in the US, down to at least sixteenths of an inch); a simple calculator; a pair of scissors, or a knife; an orange, or a bald tennis ball; a piece of string, long enough to circumnavigate the orange or tennis ball; and a roll of sticky tape. Readers that have had their recessionary research budgets cut so far that this hardware is prohibitively expensive should skip the next few pages.
The first task is to wrap a few turns of sticky tape around the sharp point of the compass's ``pivot'' arm, slightly extending past the point. This is to avoid puncturing the orange or tennis ball (as appropriate); orange-equipped readers that like orange juice, and do not have any objections to licking their apparatus, may omit this step.
The next task is to verify that the orange or tennis ball is as close to spherical as possible for objects of their type, and suitable for being written on by the ballpoint pens. If this proves to be the case, pick up the piece of string and wrap it around the orange or ball; do not worry if the string does not yet follow a great circle. Place your right thumb on top of the string near where it overlaps itself (but not on top of that point). With the fingers of your left hand, roll the far side of the string so that the string become more taut; let it slip under your right thumb only gradually; but make sure that no parts of the string ``grab'' the surface of the orange or ball (except where your right thumb is holding it!). After the rolled string passes through a great circle, it will become loose again (or may even roll right off). Without letting go with the right thumb, mark the point on the string where it crosses its other end with the texta. Now put the orange or ball down, cut the string at the mark, and dispose of the part that did not participate in the circumnavigation. Fold the string in half, pulling it taut to align the ends. At the half-way fold, mark it with the texta. Then fold it in half again, and mark the two unmarked folds with the texta. On unravelling the string, there should be three texta marks, indicating the quarter, half and three-quarter points along its length. Now pull the string taut along the ruler and measure its length. This is the circumference of the orange or ball; store its value in the calculator's memory (if it has one), or else write it down: we will use it later.
We now define some geographical names for places on our sphere,
by analogy with the surface of the Earth.
For orange-equipped readers, the
North Pole of the orange will be defined as the point where the stem
was attached.
For tennis-ball-equipped readers, a mark should be made arbitrarily
to denote the North Pole.
Mark this pole with a small `N' with the blue ballpoint pen.
Similarly, the small mark
;SPMgt;From this point, it will be assumed, for definiteness, that the object
is an orange; possessors of tennis balls can project a mental image of
the texture of an orange onto the surface of their ball, if so desired.
Place the string around the orange, making sure the North and South
Poles are aligned.
(If possessors of oranges find, at this point, that the mark on the
orange is not at the half-way point on the string, then either mark
a new South Pole to agree with the string, or get another orange.)
Now find the one-quarter and three-quarter points on the string,
and use the texta to make marks on the orange at these two points.
Put the string down.
Choose one of the two points just marked---the one whose surrounding
area is most amenable to being written on.
This point shall be called <#1242#>Singapore<#1242#>
(being a recognisable city, near the Equator, close to the centre of
conventional planar maps of the world);
write a small `S' on the orange with the pen next to it.
(This mark cannot be confused with the South Pole, since it is not
diametrically opposite the North!)
The marked point diametrically opposite Singapore will be called <#1243#>Quito<#1243#>;
it may be labelled, but will not be used much in the following.
Next, wind the string around the orange, so that is passes through
all of the following points: the two Poles, Singapore and Quito.
Use the blue ballpoint pen to trace around the circumference, completing
a great circle through these points, taking care that the string is
accurately aligned; this circle will be referred to henceforth as the
<#1244#>Central Meridian<#1244#>.
(If the pen does not write, wipe the orange's surface dry, get the
ink flowing from the pen by scribbling on some paper, and try again.
Two sets of ballpoint pens can make this procedure easier.)
Now wrap the string around the orange, through the poles, but roughly
We are now in a position to start to relate this orangeography to our crude mapping of the human visual system. We shall imagine that the surface of the orange represents the solid angles seen by the viewer's eye, by imagining that the viewer's eye is located at the <#1247#>centre<#1247#> of the orange; the orange would (ideally) be a transparent sphere, fixed to the viewer's head, on which we would plot the extent of her view. Firstly, we shall consider the situation when the viewer is looking straight ahead at an object at infinity, with her head facing in the same direction. We shall, in this situation, define the direction of view ( the direction of <#1248#>foveal view<#1248#>---the most detailed vision in the centre of our vision) as being in the <#1249#>Singapore<#1249#> direction (with the North Pole towards the top of the head). One could imagine two oranges, side by side, one centred on each of the viewer's eyes, with both Singapores in the same direction; this would represent the viewer's direction of foveal view from each eye in this situation. Having defined a direction thus, the oranges should now be considered to be <#1250#>fixed<#1250#> to the viewer's head for the remainder of this section.
We now concentrate solely on the <#1251#>left<#1251#> eye of the viewer, and the corresponding orange surrounding it. We shall, in the following, be using the calculator to compute lengths on the ruler against which we shall set our compass; readers that will be using solar-powered credit-card-sized models, bearing in large fluorescent letters the words ``ACME CORPORATION---FOR ALL YOUR COMPUTER NEEDS'', should at this point relocate themselves to a suitably sunny location to avoid catastrophic system shut-downs. The compass, having been set using the ruler to the number spat out by the calculator, will be used to both measure ``distances'' between points inhabiting the curved surface of the orange, as well as to actually draw circles on the thing.
The first task is to compute how long 0.122 circumferences is.
(For example, if the circumference of the orange was 247~mm,
punch ``
It will be noted that, all in all, this solid angle of foveal view is not too badly ``curved'', when looked at in three dimensions. Place a <#1255#>flat plane<#1255#> (such as a book) against the orange, so that it touches the orange at Singapore. One could imagine cutting out the peel of the orange around this red circle, and ``flattening it'' onto the plane of the book without too much trouble; the outer areas would be stretched (or, if dry and brittle, would fracture), but overall the correspondence between the section of peel and the planar surface is not too bad. (Readers possessing multiple oranges, who do not mind going through the above procedure a second time, might actually like to try this peel-cutting and -flattening exercise.) The surface of the plane corresponding to the flattened peel corresponds, roughly, to the maximum (apparent) size that a traditional, desk-mounted graphics screen can use: any larger than this and the user would need to <#1256#>move her head<#1256#> to be able to focus on all parts of the screen---a somewhat tiring requirement for everyday computing tasks. Thus, it can be seen that it is <#1257#>the very structure of our eyes<#1257#> that allows flat display devices to be so successful: any device subtending a small enough solid angle that all points can be ``read'' ( viewed in fine detail) without gross movements of one's head cannot have problems of ``wrap-around'' anyway.
systems, of course, have completely different goals: the display device is not considered to be a ``readable'' object, as it is in traditional computing environments---rather, it is meant to be a convincing, <#1258#>completely immersive<#1258#> stimulation of our visual senses. In such an environment, <#1259#>peripheral vision<#1259#> is of vital importance, for two reasons. Firstly, and most obviously, the convinceability of the session will suffer if the participant ``has blinkers on'' (except, of course, for the singular case in which one is trying to give normally-sighted people an idea of what various sight disabilities look like from the sufferer's point of view). Secondly, and quite possibly more importantly, is the fact that, although our peripheral vision is no good for <#1260#>detailed<#1260#> work, it is especially good at detecting <#1261#>motion<#1261#>. Such feats are not of very much use in traditional computing environments, but are vital for a participant to (quite literally) ``get one's bearings'' in the spatially-extended immersive environment. We must therefore get some sort of feeling---using our orange-mapped model---for the range of human peripheral vision, so that we might cater for it satisfactorily in our hardware and software implementations.
The following construction will be a little more complicated than the earlier ones; a ``check-list'' will be presented at the end of it so that the reader can verify that it has been carried out correctly. Firstly, set the compass tip-separation to a distance (measured, as always, on the ruler) equal to 0.048 circumferences (a relatively small distance). Place the pivot on Singapore, and mark off the position to the <#1262#>east<#1262#> (right) of Singapore where the pen arm intersects the Equator. We are now somewhere near the Makassar Strait. Now put the <#1263#>blue<#1263#> ballpoint pen into the compass, and set its tip-to-tip distance to 0.2 circumferences; this is quite a large span. Now, <#1264#>with the pivot on the new point marked in the Makassar Strait<#1264#>, carefully draw a circle on the orange. The large portion of solid angle enclosed by this circle represents, in rough terms, the range of our peripheral vision. To check that the construction has been carried out correctly, measure the following distances, by placing the compass tips on the two points mentioned, and then measuring the tip-to-tip distance on the ruler: ;SPMgt;From Singapore to the point where this freshly-drawn circle cuts the Central Meridian (either to the north or south): about 0.18 circumferences; from Singapore to the point where the circle cuts the Equator to the <#1265#>west<#1265#>: about 0.16 circumferences; from Singapore to the point where the circle cuts the Equator to the <#1266#>east<#1266#>: about 0.23 circumferences. If these are roughly correct, one can, in fact, also check that the <#1267#>earlier<#1267#>, foveal construction is correct, by measuring the tip-to-tip distance between the red and blue circles where they cross the Equator and Central Meridian. To the west, this distance should be about 0.04 circumferences; to the east, about 0.13 circumferences; to the north and south, about 0.08 circumferences.
There are several features of the range of our peripheral vision that
we can now note.
Firstly, it can be seen that our eyes can actually look
a little <#1268#>behind<#1268#> us---the left eye to the left, and the right
eye to the right.
(Wrap the string around the orange, through the poles,
Secondly, we note that a reasonable approximation is to consider
the point near the Makassar Strait to be the ``centre'' of a circle
of field of view for the left eye.
(This unproved assertion by the author may be verified as approximately
correct by an examination of ocular data.)
The field of view extends about
Let us reconsider, nevertheless,
our earlier idea of placing a planar display
device in front of each eye (with appropriate optics for focusing).
How would such a device perform?
Let us ignore, for the moment, the actual size of the device, and merely
consider the <#1275#>density of pixels<#1275#> in any direction of sight.
For this purpose, let us assume that the pixels are laid out on a
regular rectangular grid (as is the case for real-life display devices).
Let the distance between the eye and the plane (or the
effective plane, when employing optics) be R pixel widths ( we
are using the pixel width as a unit of distance, not only in the
display plane, but also in the orthogonal direction).
Let us measure first along the x axis of the display, which we shall
take to have its origin at the point on the display
pierced by a ray going through the eye and the Makassar Strait point
(which may be visualised by placing a plane, representing the display,
against the orange at the Makassar Strait).
Let us imagine that we have a head-mountable display of resolution (say)
The answer comes from considering, as a simple example, the very
<#1290#>edgemost<#1290#> pixel in the x-direction on the display---the one that is
The reason for this waste of resolution, of course, is that we have tried to stretch a planar device across our field of view. What is perhaps not so obvious is the fact that no amount of technology, no amount of optical trickery, can remove this problem: <#1304#>it is an inherent flaw in trying to use a planar display as a flat window on the virtual world<#1304#>. This point is so important that it shall be repeated in a slightly different form: <#1305#>Any rendering algorithm that assumes a regularly-spaced planar display device will create a central pixel 33 times the size of a peripheral pixel under conditions.<#1305#> This is not open for discussion; it is a mathematical fact.
Let us, however, consider whether we might not, ultimately, avoid
this problem by simply producing a higher-resolution display.
Of course, one can always compensate for the factor of four
degradation by increasing the linear resolution of the device
in each direction by this factor.
However, there is a more disturbing
psychological property of the planar display:
pixels near the centre of view
seem chunkier than those further out; it becomes psychologically
preferable to <#1306#>look askew<#1306#> at objects, using the eye muscles
together with the increased outer resolution to get a better view.
This property is worrying, and could, potentially, cause eye-strain.
To avoid this side-effect, we would need to have the
<#1307#>entire<#1307#> display of such a high resolution that even the central
pixel (the worst one) is below foveal resolution.
We earlier showed that a 512-pixel-wide planar display produces
a central pixel angular length of about
The clear message from this line of thought is the following: <#1310#>planar-display rendering has no future in <#1310#>. Assumption of a planar view-plane is, in traditional computer graphics applications, convenient: the perspective view of a line is again a line; the perspective view of a polygon is also a polygon. Everything is cosy for algorithm-inventors; procedures can be optimised to a great extent. But its performance is simply unacceptable for wrap-around display devices. We must farewell an old friend.
How, then, are we to proceed? Clearly, designers of current systems have not been hamstrung by this problem: there must be a good trick or two that makes everything reasonable again, surely? And, of course, there is: one must map (using optics) the rectangular display devices that electronics manufacturers produce in an <#1311#>intelligent<#1311#> way onto the solid angle of the human visual field. One must not, however, be fooled into thinking that any sophisticated amount of optics will ever ``solve'' the problem by itself. The mapping will also transform the <#1312#>logical<#1312#> rectangular pixel grid that the <#1313#>rasterisation software<#1313#> uses, in such a way that (for example) polygons in physical space will <#1314#>not<#1314#> be polygons on the transformed display device. (The only way to have a polygon stay a polygon is to have a planar display, which we have already rejected.)
Let us now consider how we should like to map the physical display device onto the eye's field of vision. Our orange helps us here. All points on the surface within the red circle should be at <#1315#>maximum solid-angle resolution<#1315#>, as our foveal vision can point in any direction inside this circle. However, look at the skewed-strip of solid angle that this leaves behind ( those solid angles that are seen in peripheral vision, but not in foveal vision; the area between the red and blue circles): it is not overwhelming. Would there be much advantage in inventing a transformation that left a <#1316#>lower<#1316#>-resolution coverage in this out-lying area, to simulate more closely what we can actually perceive? Perhaps; but it is the opinion of the author that simply removing the distortions of planar display viewing should be one's first consideration. Let us therefore simply aim to achieve a <#1317#>uniform solid-angle resolution in the entire field of view<#1317#>. Note carefully that this is <#1318#>not<#1318#> at all the same as simply viewing a planar display directly that itself has a uniform resolution across its planar surface---as has been convincingly illustrated above. Rather, we must obtain some sort of smooth (and, hopefully, simple) <#1319#>mapping<#1319#> of the one to the other, which we will implement physically with optics, and which must be taken account mathematically in the rendering software.
How, then, does one map a planar surface onto a spherical one? Cartographers have, of course, been dealing with this problem for centuries, although usually with regard to the converse: how do you map the surface of the earth onto a flat piece of paper? Of the hundreds of cartographical projections that have been devised over the years, we can restrict our choices immediately, with some simple considerations. Firstly, we want the mapping to be smooth everywhere, out to the limits of the field of view, so that we can implement it with optical devices; thus, ``interrupted'' projections ( those with slice-marks that allow the continents to be shown with less distortion at the expense of the oceans) can be eliminated immediately. Secondly, we want the projection to be an <#1320#>equal-area<#1320#> one: equal areas on the sphere should map to equal areas on the plane. Why? Because this will then mean that <#1321#>each<#1321#> pixel on the planar display device will map to the <#1322#>same<#1322#> area of solid angle---precisely what we have decided that we want to achieve.
OK, then, the cartographers can provide us with a large choice of uninterrupted, equal-area projections. What's the catch? This seems too easy! The catch is, of course, that while all of the square pixels <#1323#>will<#1323#> map to an equal area of solid angle, they will <#1324#>not<#1324#> (and, indeed, mathematically <#1325#>cannot<#1325#>) all map to square-shapes. Rather, all but a small subset of these pixels will be distorted into diamond-like or rectangular-like shapes (the suffix <#1326#>-like<#1326#> being used here because the definition of these quantities on a curved surface is a little complicated; but for small objects like pixels one can always take the surface to be locally flat). Now, if our rendering software were to think that it was still rendering for a <#1327#>planar<#1327#> device, this distortion would indeed be real: objects would be seen by the participant to be warped and twisted, and not really what would be seen in normal perspective vision. However, if we have suitably briefed our rendering software about the transformation, then the image <#1328#>can<#1328#> be rendered free of distortion, by simply ``undoing'' the effect of the mapping. Again we ask: what <#1329#>is<#1329#> the catch?
The catch---albeit a more subtle and less severe one now---is that the directional resolution of the device will not be homogeneous or isotropic. For example, if a portion of solid angle is covered by a stretched-out rectangular pixel, the local resolution in the direction of the shorter dimension is higher then that in the longer direction. We have, however, insisted on an equal-area projection; therefore, the ``lengths'' of the long and short dimensions must multiply together to the same product as any other pixel. This means that the <#1330#>geometric mean<#1330#> of the local resolutions in each of these two directions is a constant, independent of where we are in the solid angle almost-hemisphere, the square-root of {the pixels-per-radian in one direction} times {the pixels-per-radian in the orthogonal direction}, evaluated at <#1331#>any<#1331#> angle of our field of view, will be some constant number, that characterises the resolution quality of our display system. This is what an equal-area projection gives us.
OK then, we ask, of all the uninterrupted equal-area projections that
the cartographers have devised, is there any one that does <#1332#>not<#1332#>
stretch shapes of pixels in this way?
The cartographer's answer is, of course, no:
that is the nature of mapping between
surfaces of different intrinsic curvature; you can only get rid of
some problems, but not all of them.
However, while there is <#1333#>no<#1333#> way to obtain a distortion-free projection,
there are, in fact, an <#1334#>infinite<#1334#> number of ways we could implement
simply an equal-area projection.
To see that, it is sufficient to consider the s of the
physical planar display device,
which we shall call X, Y and Z (where Z-buffering is employed),
as functions of the spherical s r, þ and
Let us, therefore, consider again the human side of the equation.
Our visual systems have evolved in an environment
somewhat unrepresentative
of the Universe as a whole:
gravity pins us to the surface of the planet, and drags everything
not otherwise held up downwards; our evolved
anatomy requires that we are, most of the time, in the same
``upright'' position against gravity;
our primary sources of
illumination (Sun, Moon, planets) were always in the ``up'' direction.
It is therefore not surprising that our visual senses do not
interpret the three spatial directions in the same way.
In fact, we often tend to view things in a somewhat ``2
Consider, furthermore, the motion of our head and eyes. The muscles in our eyes are attached to pull in either the horizontal or vertical directions; of course, two may pull at once, providing a ``diagonal'' motion of the eyeball, but we most frequently look either up--down <#1367#>or<#1367#> left--right, as a rough rule. Furthermore, our necks have been designed to allow easy rotation around the vertical axis (to look around) and an axis parallel to our shoulders (to look up or down); we can also cock our heads of course, but this is less often used; and we can combine all three rotations together, although this may be a bone-creaking (and, according to current medical science, dangerous) pastime.
Thus, our <#1368#>primary<#1368#> modes of visual movement are left--right (in
which we expect to see a planar-symmetrical world) and up--down
(to effectively ``scan along'' the planes).
Although this is a terribly simplified description, it gives us
enough information to make at least a reasonable choice of equal-area
projection for use.
Consider building up such a mapping pixel-by-pixel.
Let us start in the <#1369#>centre<#1369#> of our almost-hemispherical
solid area of vision
( the Makassar Strait on our orange), which is close
to---but not cident with---the
direction of straight-ahead view (Singapore on our orange).
Imagine that we place a ``central pixel'' there,
in the Makassar Strait.
(By ``placing a pixel'' we mean placing the mapping of the
corresponding square pixel of the planar display.)
In accordance with our horizontal-importance philosophy, let us simply
continue to place pixels around the Equator, side by side, so
that there is <#1370#>an equal, undistorted density of pixels<#1370#> around it.
This is what our participant would see if looking directly left or
right from the central viewing position; it would seem
(at least along that line) nice and
regular.
(We need only stack them as far as
Now let us do the same thing in the <#1373#>vertical<#1373#> direction, stacking
up a single-pixel wide column starting at the Makassar Strait, and
heading towards the North Pole; and similarly towards the South Pole.
(Again, we can stop short
Can we continue to place any more pixels in a distortion-free way? Unfortunately, we cannot; we have used up all of our choices of distortion-free lines. How then do we proceed now? Let us try, at least, to maintain our philosophy of splitting the field of view into <#1377#>horizontal planes<#1377#>. Consider going around from Makassar Strait, placing square pixels, as best we can, in a ``second row'' above the row at the Equator (and, symmetrically, a row below it also). This will not, of course, be 100% successful: the curvature of the surface means there must be gaps, but let us try as best we can. How many pixels will be needed in this row? Well, to compute this roughly, let us approximate the field of view, for the moment, by a complete hemisphere; we can cut off the extra bits later. Placing a second row of pixels on top of the first amounts to traversing the globe at a <#1378#>constant latitude<#1378#>, travelling along a Parallel of latitude. This Parallel is <#1379#>not<#1379#> the shortest surface distance between two points, of course; it is rather the intersection between the spherical surface and a <#1380#>horizontal plane<#1380#>---precisely the object we are trying to emulate. Now, how long, in terms of surface distance, is this Parallel that our pixels are traversing? Well, some simple solid geometry and trigonometry shows that the length of the (circular) perimeter of the Parallel of latitude þ is simply C cosþ, where C is the circumference of the sphere. Thus, in some sort of ``average'' way, the number of pixels we need for the second row of pixels will be cosþ times smaller than for the Equator, if we are looking at a full hemisphere of view. This corresponds, on the (X, Y) device, to only extending a distance roughly cosþ <#1381#>shorter<#1381#> in the X direction for the horizontal line of pixels cutting the Y-axis at the value Y = 1 pixel, than was the case for the Equatorial pixels (which mapped to the line Y = 0). It is clear that the shape we are filling up on the (X, Y) device is <#1382#>not<#1382#> a rectangle; this point will be returned to shortly.
We can now repeat the above procedure again and again,
placing a new row of pixels
(as best will fit)
above and below the Equator at successively polar latitudes; eventually,
we reach the Poles, and only need a single pixel on each Pole itself.
We have now completely covered the
entire forward hemisphere of field of view, with pixels
smoothly mapped from a regular physical display, according to our
chosen design principles.
What is the precise mathematical relationship between (X, Y) and
(þ,
Now, our method above <#1397#>should<#1397#> have produced an equal-area
mapping---after all, we built it up by notionally placing display
pixels directly on the curved surface!
But let us nevertheless verify mathematically that the
equal-area criterion, equation~<#1398#>EqualAreaCrit<#1398#>, <#1399#>is<#1399#>
indeed satisfied by the transformation equations~<#1400#>XYFromThetaPhi<#1400#>.
Clearly, on partial-differentiating equations~<#1401#>XYFromThetaPhi<#1401#>,
we obtain
The transformation <#1408#>XYFromThetaPhi<#1408#> is a relatively simple one.
The forward hemisphere of solid angle of view is mapped to a portion
of the display device whose shape is simply a <#1409#>pair of back-to-back
sinusoids<#1409#> about the Y-axis, as may be verified by plotting all of the
points in the (X, Y) plane
corresponding to the edge of the hemisphere, namely,
those corresponding to
The astute reader may, by now, have asked the question, ``Isn't it
silly to just use a sinusoidal swath of our display device---we're
wasting more than 36
There is, however, a more subtle reason why, in fact, not using 36% of the display can be a <#1414#>good<#1414#> thing. Consider a consumer electronics manufacturer fabricating small, light, colour LCD displays, for such objects as camcorders. Great advances are being made in this field regularly; it is likely that ever more sophisticated displays will be the norm for some time. Consider what happens when a <#1415#>single LCD pixel<#1415#> is faulty upon manufacture (more likely with new, ground-breaking technology): the device is of no further commerical use, because all consumer applications for small displays need the full rectangular area. There is, however, roughly a 36% chance that this faulty pixel falls <#1416#>outside<#1416#> the area necessary for a head-mounted display---and is thus again a viable device! If the electronics manufacturer is simultaneously developing commerical systems, here is a source of essentially free LCD displays: the overall bottom line of the corporation is improved. Alternatively, other hardware manufacturers may negotiate a reasonable price with the manufacturer for purchasing these faulty displays; this both cuts costs of the display hardware (especially when using the newest, most expensive high-resolution display devices---that will most likely have the highest pixel-fault rate), as well as providing income to the manufacturer for devices that would otherwise have been scrapped. Of course, this mutually beneficial arrangement relies on the supply-and-demand fact that the consumer market for small LCD display devices is huge compared to that of the current industry, and, as such, will not remain so lucrative when hardware outstrips the camcorder market; nevertheless, it may be a powerful fillip to the industry in the fledgling form that it is in today.
It is now necessary to consider the full transformation from the
physical space of the virtual world, measured in the Cartesian
system (x, y, z), to the space of the physical planar display
device, (X, Y, Z) (where Z will be now be used for the Z-buffering
of the physical display and its video controller).
In fact, the only people who will ever care about the intermediate
spherical system,
What, then, is the precise relationship between (X, Y, Z) space and (x, y, z) space? To determine this, we need to have defined conventions for the (physically fixed) (x, y, z) axes themselves in the first place. But axes that are <#1419#>fixed<#1419#> in space are not very convenient for <#1420#>head-mounted<#1420#> systems; let us, therefore, define a <#1421#>second<#1421#> set of Cartesian axes (u, v, w), whose (linear) transformation from the (X, Y, Z) space consists of the translation and rotation from the participant's head position to the fixed system.
At this point, however, we note
a considerable complication: the line of vertical symmetry
in the (X, Y) plane has (necessarily)
been taken to be through the Makassar
Strait direction---which is in a <#1422#>different<#1422#>
physical direction for each eye.
Therefore, let us define, not one, but <#1423#>two<#1423#> new intermediate
sets of Cartesian axes in (virtual) physical space,
With these conventions, the
There is still, however,
the question of deciding what functional form Ze
will take, in terms of the spherical s
Finally,
linking the transformations <#1466#>XYZFromUVW<#1466#>
to the virtual-world <#1467#>fixed<#1467#>
physical system, (x, y, z), requires a transformation from the
Makassar-centred system
A final concern for the use of the Sanson--Flamsteed projection (or, indeed any other projection) in is to devise efficient rendering algorithms for everyday primitives such as lines and polygons. Performing such algorithmic optimisation is an artform; the author would not presume to intrude on this intricate field. However, a rough idea of how such non-linear mappings of lines and polygons might be handled is to note that <#1470#>sufficiently small<#1470#> sections of such objects can always be reasonably approximated by lines, parabolas, cubics, and so on. A detailed investigation of the most appropriate and efficient approximations, for the various parts of the solid angle mapped by the projection in question (which would, incidentally, become almost as geographically ``unique'', in the minds of algorithmicists, as places on the real earth), would only need be done once, in the research phase, for a given practical range of display resolutions; rendering algorithms could then be implemented that have this information either hard-wired or hard-coded. It may well be useful to slice long lines and polygons into smaller components, so that each component can be handled to pixel-resolution approximation accurately, yet simply. All in all, the problems of projective, perspective rendering are not insurmountable; they simply require sufficient (preferably non-proprietary-restricted) research and development.
If you thought the science of computer graphics was a little warped before, then you ain't seen nothing yet.
<#1471#>LocalUpdate<#1471#><#1472#>Local-Update Display Philosophy<#1472#> Having diverted ourselves for a brief and fruitful (sorry) interlude on head-mounted display devices, we now return to the principal topic of this : ing, and its implementation in practical systems. We shall, in this final section, critically investigate the <#1473#>basic philosophy<#1473#> underlying current image generation---which was accepted unquestioningly in the minimal implementation described in section~<#1474#>MinimalImplementation<#1474#>, but which we must expect to be <#1475#>itself<#1475#> fundamentally influenced by the use of ing.
Traditionally, perspective 3-dimensional computer graphics has been performed on the ``clean slate'' principle: one erases the frame buffer, draws the image with whatever sophistication is summonable from the unfathomable depths of ingenuity, and then projects the result onto a physical device, evoking immediate spontaneous applause and calls of ``Encore! Encore!''. This approach has, to date, been carried across largely unmodified into the environment, but with the added imperative: get the bloody image out within 100~milliseconds! This is a particularly important yet onerous requirement: if the participant is continually moving around (the general case), the view is continually changing, and it must be updated regularly if the participant is to function effectively in the virtual world at all.
With ing, however, our image generation philosophy may be profitably shifted a few pixels askew. As outlined in section~<#1476#>MinimalImplementation<#1476#>, a <#1477#>Galilean<#1477#> update of the image on the display gives the video controller sufficient information to move that object reasonably accurately for a certain period of time. This has at least one vitally important software ramification: <#1478#>the display processor no longer needs to worry about churning out complete images simply to simulate the effect of motion<#1478#>; to a greater or lesser extent, objects will ``move themselves'' around on the display, ``unsupervised'' by the display processor. This suggests that the whole philosophy of the image generation procedure be subtly changed, but changed all the way to its roots: <#1479#>Only objects whose self-propagating displayed images are significantly out-of-date should be updated<#1479#>. Put another way, we can now organise the image generation procedure in the same way that we (usually!) organise our own lives: urgent tasks should be done NOW; important tasks should be done <#1480#>soon<#1480#>; to-do-list tasks should be picked off when other things aren't so hectic.
How, then, would one go about implementing such a philosophy, if one were building a brand-new system from the ground up? Firstly, the display-processor--video-controller interface should be designed so that updates of only <#1481#>portions<#1481#> of the display can be cleanly <#1482#>grafted<#1482#> onto the existing self-propagating image; in other words, <#1483#>local updates<#1483#> must be supported. Secondly, this interface between the display processor and the video controller---and, indeed, the whole software side of the image-generation process---must have a reliable method of ensuring <#1484#>timing and synchronisation<#1484#>. Thirdly, the display processor must be redesigned for <#1485#>reliable rendering<#1485#> of local-update views: the laxity of the conventional ``clean the slate, draw the lot'' computer graphics philosophy must be weeded out. Fourthly, it would be most useful if updates for <#1486#>simple motions of the participant herself<#1486#> could be catered for automatically, by specialised additions to the video controller hardware, so that entire views need not be regenerated simply because the participant has started jerking around a bit. Fifthly, some sort of <#1487#>caching<#1487#> should be employed on the pixelated display image, to further reduce strain on the process of generating fresh images. Finally, and ultimately most challengingly, the core operating system, and the applications it supports, must be structured to fully exploit this new hardware philosophy maximally.
Let us deal with these aspects in turn. We shall not, in the following discussion, proscribe solutions to these problems in excessive technical detail, as performing this task optimally can only be done by the designer of each particular implementation.
First on our list of requirements is that <#1488#>local updates<#1488#> of the display be possible. To illustrate the general problem most clearly, imagine the following scenario: A participant is walking down the footpath of a virtual street, past virtual buildings, looking down the alley-ways between them as she goes. Imagine that she is walking down the left-hand footpath of the street, on the same side of the road as a car would (in civilised countries, at least). Now imagine that she is passing the front door of a large, red-brick building. She does not enter, however; rather, she continues to stroll past. As she approaches the edge of the building's facade, she starts to turn her head to the left, in anticipation of looking down the alley-way next to the building. At the precise instant the edge of the facade passes her...press an imaginary ``pause'' button, and consider the session to date from the system's point of view.
Clearly, as our participant was passing the front of the building, its facade was slipping past her view smoothly; its apparent motion was not very violent at all; we could quite easily redraw most of it at a fairly leisurely rate, relying on ing to make it move smoothly---and the extra time could be used to apply a particularly convincing set of Victorian period textures to the surfaces of the building (which would propagate along with the surfaces they are moulded to). We are, of course, here relying on the relatively mundane motion of the virtual objects in view as a <#1489#>realism lever<#1489#>: these objects can be rendered less frequently, but more realistically. And this is, indeed, precisely what one <#1490#>does<#1490#> want from the system: a casual stroll is usually a good opportunity to ``take a good look at the scenery'' (although mankind must pray that, for the good of all, ever creates a Virtual Melbourne Weather module).
Now consider what happens at the point in time at which
we freeze-framed our
session.
Our participant is just about to look down an alley-way: she doesn't
know what is down there; passing by the edge of the facade will
let her have a look.
The only problem is that
<#1491#>the video-controller doesn't know what's down there
either<#1491#>: the facade of the building Galileanly moves out of
the way, leaving...well, leaving nothing; the video
controller, for want of something better,
falls back to
So what <#1494#>should<#1494#> be visible when our participant looks down the alley? Well, a whole gaggle of objects may just be coming into view: the side wall of the building; the windows in the building; the pot-plants on the windowsills; a Japanese colleague of our participant who has virtu-commuted to her virtual building for a later meeting, who is right now sipping a coffee and happily waving out the window at her; and so on. But none of this is yet known to the video controller: a massive number of objects simply do not exist in the frame buffer at all. We shall say that these objects <#1495#>have gone information-critical<#1495#>: they are Priority One: something needs to be rendered, and rendered NOW.
How does the system carry out this task? To answer this, it is necessary to examine, in a high-level form, how the entire system will work as a whole. To see what would occur in a <#1496#>well-designed<#1496#> system at the moment we have freeze-framed, we need to wind back the clock by about a quarter of a second. At that earlier time, the operating system, constantly projecting the participant's trajectory forward in time, had realised that the right side wall of building #147 would probably go critical in approximately 275 milliseconds. It immediately instigated a Critical Warning sequence, informing all objects in the system that they may shortly lose a significant fraction of display processing power, and should take immediate action to ensure that their visual images are put into stable, conservative motion as soon as possible. The right wall of building #147 is informed of its Critical Warning status, as well as the amount of extra processing power allocated to it; the wall, in turn, proceeds to carry out its pre-critical tasks: a careful monitoring of the participant's extrapolated motion and critical time estimate; a determination of just precisely which direction of approach has triggered this Critical Warning; a computation of estimates of the trajectories of the key control points of the wall and its associated objects; and so on. By the time 150~milliseconds have passed, most objects in the system have stabilised their image trajectories. The right wall of building #147 has decided that it will now definitely go critical in 108~milliseconds, and requests the operating system for Critical Response status. The operating system concurs, and informs the wall that there are no other objects undergoing Critical Reponse, and only one other object on low-priority Warning status; massive display-processing power is authorised for the wall's use, distributed over the next 500 milliseconds. The wall immediately refers to the adaptive system performance table, and makes a conservative estimate of how much visual information about the wall and associated objects it can generate in less than 108 milliseconds. It decides that it can comfortably render the gross features of all visible objects with cosine-shaded polygons; and immediately proceeds to instruct the display controller with the relevant information---not the positions, velocities, accelerations, colours and colour derivatives of the objects as they are <#1497#>now<#1497#>, but rather where they will be <#1498#>at the critical time<#1498#>. It time-stamps this image composition information with the projected critical time, which is by this time 93 milliseconds into the future; and then goes on to consider how best it can use its authorised <#1499#>post-critical<#1499#> resources to render a more realistic view. While it is doing so---and while the other objects in the system monitor their own status, mindful of the Critical Response in progress---the critical time arrives. The pixelated wall---with simple shaded polygons for windows, windowsills, pot-plants, colleagues---which the display processor completed rendering about 25 milliseconds ago, is instantaneously grafted onto the building by the video controller; the participant looks around the corner and sees...a wall! 200 milliseconds later---just as she is getting a good look---the video controller grafts on a new rendering of the wall and its associated objects: important fine details are now present; objects are now Gouraud-shaded; her colleague is recognisable; the coffee cup has a handle. And so she strolls on...what an uneventful and relaxing day this is, she thinks.
Let us now return to our original question: namely, what are the additional features our display processor and video controller need to possess to make the above scenario possible. Clearly, we need to have a way of <#1500#>grafting<#1500#> a galpixmap frame buffer onto the existing image being propagated by the video controller. This is a similar problem (as, indeed, much of the above scenario is) to that encountered in <#1501#>windowed<#1501#> operating systems. There, however, all objects to be grafted are simply rectangles, or portions thereof. In such a situation, one can code very efficiently the shape of the area to be grafted by specifying a coded sequence of corner-points. However, our scenario is much more complex: how do you encode the shape of a wall? The answer is: you don't; rather, you (or, more precisely, your display processor) uses the following more subtle procedure: Firstly, the current frame buffer that has been allocated to the display processor for rendering purposes is cleared. How do we want to ``clear'' it? Simple: set the <#1502#>debris indicator<#1502#> of each galpixel in the frame buffer to be true. Secondly, the display processor proceeds to render only those objects that it is instructed to---clearing the debris indicators of the galpixels it writes; leaving the other debris indicators alone. Thirdly, when the rendering has been completed, the display processor informs the video controller that its pizza is ready, and when it should deliver it; the display processor goes on to cook another meal. When the stated time arrives, the video controller grafts the image onto its current version of the world, simply <#1503#>ignoring<#1503#> any pixels in the new image that are marked as debris. It is in this way that a wall can be ``grafted'' onto a virtual building without having to bulldoze the whole building and construct it from scratch.
It should be obvious that it is necessary to come up with some sort of terms that portray the difference between the frame buffers that the video controller uses to ly-propagate the display from frame to frame, and the frame buffers that the display processor uses to generate images that will be ``grafted on'' at the appropriate time. To this end, we shall simply continue to refer to the video controller's propagating frame buffers as ``frame buffers''---or, if a distinction is vital, as ``Galilean frame buffers''. Buffers that the display processor uses to compose its grafted images will, on the other hand, be referred to as <#1504#> buffers<#1504#>. (``Meet you under the clocks at Station at 3~o'clock'' being the standard rendezvous arrangement in this city.) Clearly, for the display processor to be able to run at full speed, it should have at least two---and preferably more--- buffers, so that once it has finished one it can immediately get working on another, even if the rendezvous time of the first has not yet arrived.
It is also worthwhile considering the parallel nature of the system when designing the display controller and the operating system that drives it. At any one point time, there will in general be a number of objects (possibly a very large number) all passing image-generation information to the display processor for displaying. Clearly, this cannot occur directly: the display processor would not know whether it was Arthur or Martha, with conflicting signals coming from all directions. Rather, the operating system must handle image-generation requests in an organised manner. In general, the operating system will the requests of various objects, using its intelligence to decide on when the ``next train from '' will be leaving. Just as with the real Station, image generation requests will be pooled, and definite display rendezvous times scheduled; the operating system then informs each requesting object of the on-the-fly timetable, and each object must compute its control information <#1505#>as projected to the rendezvous time of the most suitable scheduled time<#1505#>. Critical Warning and Critical Response situations are, however, a little different, being much like the Melbourne Cup and AFL Grand Final days: the whole timetable revolves around these events; other objects may be told that, regrettably, there is now no longer any room for them; they may be forced to re-compute their control points and board a later train.
These deliberations bring us to the second of our listed points of consideration for our new image-generation philosophy: <#1506#>timing and synchronisation<#1506#>. The following phrase may be repeated over and over by designers while practising Transcendental Meditation: ``Latency is my enemy. Latency is my enemy. Latency is my enemy....'' The human mind is simply not constructed to deal with latency. Echo a time-delayed version of one's words into one's own ears and you'll end up in a terrible tongue-tied tangle (as prominently illustrated by the otherwise-eloquent former Prime Minister Bob Hawke's experience with a faulty satellite link in an interview with a US network). Move your head around to a visual world that lags behind you by half a second and you'll end up sick as a dog. Try to smack a virtual wall with your hand, and have it smack you back a second later, and you'll probably feel like you're fighting with an animal, not testing out architecture. Latency simply doesn't go down well.
It is for this reason that the above scenario (and, indeed, the ing technique itself) is rooted very firmly in the philosophy of <#1507#>predictive<#1507#> control. We are not generating the sterile, static world of traditional computer graphics: one <#1508#>must<#1508#> extrapolate in order to be believable. If it takes 100 milliseconds to do something, then you should find a good predictive ``trigger'' for that event that is reasonably accurate 100 milliseconds into the future. Optimising such triggers for a particular piece of hardware may, of course, involve significant research and testing. But if a suitably reliable trigger for an event <#1509#>cannot<#1509#> be found with the hardware at hand, then either get yourself some better hardware, or else think up a less complicated (read: quicker) response; otherwise, you're pushing the proverbial uphill. ``Latency is my enemy....''
With this principle in mind, the above description of a buffer involves the inclusion of a <#1510#>rendezvous time<#1510#>. This is the precise time (measured in frames periods) at which the video controller is to graft the new image from the buffer to the appropriate buffer. As noted above, some adaptive system performance analysis must be carried out by the operating system for this to work at all---so that objects have a reasonably good idea of just <#1511#>what<#1511#> they can get done in the time allocated to them. Granted this approximate information, image-generation instructions sent to the display processor should then be such that it <#1512#>can<#1512#>, in fact, generate the desired image before the rendezvous time. The time allowed the display processor should be conservative; after all, it can always start rendering another image, into another of its buffers, if it finishes the first one early. But it is clear that being <#1513#>too<#1513#> conservative is not wise: objects will unnecessarily underestimate the amount of detail renderable in the available time; overall performance will suffer. There must be an experienced balance between these two considerations.
In any practical system, however, Murphy's Law will always hold true: somewhere, some time, most probably while the boss is inspecting your magnificent creation, the display processor will not finish rendering an image before the rendezvous time. It is important that the system be designed to handle this situation gracefully. Most disastrous of all would be for the operating system to think the complete image <#1514#>was<#1514#> in fact generated successfully: accurate information about the display's status is crucial in the control process. Equally disastrous would be for the display processor to not pass forward the image at all, or to pass it through ``late'': the former for the same reason as before; the latter becuase images of objects would then continue to propagate ``out-of-sync'' with their surroundings.
One simple resolution of this scenario is for the display processor to <#1515#>finish what it can<#1515#> before the rendezvous time; it then relinquishes control of the buffer (with a call of ``Stand clear, please; stand clear''), and the partial graft is applied by the video controller. The display processor must then <#1516#>pass a high-priority message back to the operating system that it did not complete in time<#1516#>, with the unfulfilled, or partially-fulfilled, instructions simultaneously passed back in a stack. The operating system must then process this message with the highest priority, calling on the objects in question for a Critical Response; or, if these objects subsequently indicate that the omission is not critical, a regular re-paint operation can be queued at the appropriate level of priority. Of course, Incomplete Display Output events should in practice be a fairly rare occurrence, if the adaptive performance analysis system is functioning correctly; nevertheless, their non-fatal treatment by the operating system means that performance can be ``tweaked'' closer to the limits than would otherwise be prudent.
There is another side to the question of timing and synchronisation
that we must now address.
In section~<#1517#>MinimalImplementation<#1517#>, we assumed that
the apparent motion of an object is reasonably well described by
a quadratic expression in time.
This is, of course, a reasonable approximation for global-update
systems (in which the inter-update time cannot be left too long
anyway)---and is, of course, a vast improvement over current
This bring us most naturally to the general question of just <#1527#>how<#1527#>
each object in a virtual world
can decide how badly out-of-date its image is.
This is a question that must, ultimately, be answered by
extensive research and experience; a simple method will, however,
now be proposed as a starting point.
Each object knows, naturally, what information it sends to the
display processor---typically, this consists of <#1528#>control information<#1528#>
(such as polygon vertices, velocities, ), rather than a pixel-by-pixel
description of the object.
The object also knows just <#1529#>how<#1529#> the
Another exciting prospect for the algorithmics of image update priority computation is the technique of <#1534#>foveal tracking<#1534#>, whereby the direction of foveal view is tracked by a physical transducer. A system might employ this information most fruitfully by having a <#1535#>foveal input device driver<#1535#> (like a mouse driver on a conventional computer) which gathers information about the foveal motion and relays it to the operating system to use it as it sees fit. Our foveal view, of course, tends to flick quickly between the several most ``interesting'' areas of our view, regularly returning to previously-visitied objects to get a better look. By extracting a suitable overview of this ``grazing'' information and time-stamping it, the foveal device driver can leave it up to the operating system to decipher the information if it sees fit. In times of Critical Response, of course, such information will simply be ignored (or stored for later use) by the operating system; much more important processes are taking place. However, at other times, where the participant is ``having a good look around'', this foveal information may (with a little detective work) be traced back to the objects that occupied those positions at the corresponding times; these objects may be boosted in priority over others for the purposes of an improved rendering or texturing process; however, one must be careful not to play ``texture tag'' with the participant by relying too exclusively on this (fundamentally historical) information.
We now turn to our third topic of consideration listed above, namely, ensuring that the display processor can reliably generate buffer images in the first place. This is <#1536#>not<#1536#> a trivial property; it requires a careful peration between the display processor and the image-generation software. This is because, in traditional global-update environments, the display processor can be sure that <#1537#>all<#1537#> visible objects will be rendered in any update; z-buffering can therefore be employed to ensure correct hidden surface removal. The same is <#1538#>not<#1538#> true, however, in local-update systems: only <#1539#>some<#1539#> of the objects may be in the process of being updated. If there are un-updated objects partially <#1540#>obscuring<#1540#> the objects that <#1541#>are<#1541#> being updated, hidden surface removal must still somehow be done.
One approach to this problem might be to define one or more ``clip areas'', that completely surround the objects requiring updating, and simply render all objects in these (smaller than full-screen) areas. This approach is unacceptable on a number of counts. Firstly, it violates the principles of local updating: we do <#1542#>not<#1542#> wish to re-render all objects in the area (even if it is, admittedly, smaller than full-screen); rather, we only want to re-render the objects that have requested it. Secondly, it carries with it the problem of <#1543#>intra-object mismatch<#1543#>: if one part of an object is rendered with a low-quality technique, and another part of the same object (which may ``poke into'' some arbitrary clip area) with a <#1544#>high<#1544#>-quality technique, then the strange and unnatural ``divide'' between the two areas will be a more intriguing visual feature than the object itself; the participant's consciousness will start to wander back towards the world of .
Our approach, therefore, shall be a little subtler. In generating an image, we will consider two classes of objects. The first class will be those objects that have actually been scheduled for re-rendering. The second class of objects will consist of all of those objects, not of the first class, which are known to <#1545#>obscure<#1545#> one or more of the objects of the first class (or, in practice, <#1546#>may<#1546#> obscure an object of the first class, judged by whatever simpler informational subset that optimises the speed of the overall rendering procedure). Objects of the first class shall be rendered in the normal fashion. Objects of the <#1547#>second<#1547#> class, on the other hand, will be <#1548#>shadow-rendered<#1548#>: their z-buffer information will be stored, where appropriate, <#1549#>but their corresponding colour information will flag them as debris<#1549#>. ``Clear-frame'' debris (the type used to wipe clear the buffer in the first place), on the other hand, will both be marked as debris, and z-depth-encoded to be as far away as possible. Shadow-rendering is clearly vastly quicker than full rendering: no shading or texturing is required; no motion information need be generated; all that is required are the pixels of obscuration of the object: if the galpixel currently at one such position in the buffer is currently ``further away'' than the corresponding point of the second class object, then that pixel is turned to debris status, and its z-buffer value updated to that of the second-class object; otherwise, it is left alone. In this way, we can graft complete objects into an image, <#1550#>without<#1550#> affecting any other objects, with much smaller overheads than are required to re-render the entire display---or even, for that matter, an arbitrary selected ``clip window'' that surrounds the requesting objects.
We now turn to the fourth topic of consideration above:
namely, the inclusion of
specialised hardware, over and above that needed
for ing, that can modify the <#1551#>entire<#1551#> contents of a
frame buffer to take into account the
perspectival effects of the motion of the participant
herself (as distinct from the <#1552#>proper motion<#1552#>--- with respect to
the laboratory---of any of the virtual objects).
Such hardware is, in a sense, at the other end of the spectrum
from the local updating just performed: we now wish to be able
to perform <#1553#>global updates<#1553#> of the galpixmap---but only of
its motional information.
This goal is based on the fact that small changes of acceleration (both
linear and rotational) of
the participant's head will of themselves
jerk the entire display; and all of the information necessary
to compute these shifts (relying on the z-buffer information
that is required for ing anyway) is already there
in the frame buffer.
Against this, however, is the fact that the
mathematical relations describing such transformations are nowhere
near as strikingly simple as those required for ing
itself
(which can, recall, be hard-wired with great ease); rather, we
would need some sort of maths coprocessor (or a number of them)
to effect these computations.
The problem is that these computations <#1554#>must<#1554#> be performed
during a <#1555#>single<#1555#> frame scan, and added to the acceleration of each
galpixel as it is computed (where, recall, standard
We now turn to the fifth and penultimate topic of consideration
posed above, that of <#1557#>obscuration caching<#1557#>.
The principles behind this idea are precisely the same as those
behind the highly successful technique of <#1558#>memory-caching<#1558#> that is
employed in many processor environments today.
The basic idea is simple: one can make good use of galpixels
that have been recently obscured if they again become unobscured.
The principle situation in which this occurs is where an object
close to the participant ``passes in front of'' a more distance
one, due to parallax.
Without obscuration caching, the closer object ``cuts a swath''
through the background as it goes, which thus requires regular
display processor updates in order to be regenerated.
On the other hand, if, when two galpixels come to occupy the same
display position, the closer one is displayed, and the farther one is
not discarded, but rather
relegated to a <#1559#>cache frame buffer<#1559#>, this galpixel can
be ``promoted'' back from the cache to the main display frame buffer
if the current
position in the main frame buffer becomes unoccupied.
This requires, of course, that the cache have its own ``propagator''
circuitry to propel it from frame to frame,
in accordance with the principles of
Another, simpler form of caching, <#1561#>out-of-view buffering<#1561#>, may also be of use in systems. With this approach, one builds <#1562#>more memory<#1562#> into each frame buffer than is required to hold the galpixel information for the corresponding display device: the extra memory is used for the logical galpixels <#1563#>directly surrounding<#1563#> the displayable area. An out-of-view buffer may be used in one of two ways. In <#1564#>cache-only out-of-view buffering<#1564#>, the display processor still renders only to the displayable area of memory; the out-of-view buffer acts in a similar way to an obscuration cache. Thus, if the participant rotates her head slightly to the right, the display galpixels move to the left; those galpixels that would have otherwise ``fallen off the edge'' of the memory array are instead shifted to the out-of-view buffer, up to the extent of this buffer. If the viewer is actually in the process of <#1565#>accelerating<#1565#> back to the <#1566#>left<#1566#> when this sweep begins, then in a short time these out-of-view galpixels will again come back into view (as long as they are propagated along with the rest of the display), and thus magically ``reappear'' by themselves, without having to be regenerated by the display processor.
On the other hand, in <#1567#>full out-of-view buffering<#1567#>, the entire out-of-view buffer is considered to be a part of the ``logical display'', of which the physical display device only displays a smaller subset. Objects are rendered into the out-of-view buffer just as to any other part of the display device. This approach can be useful for <#1568#>preparative buffering<#1568#>, especially when specialised head-motion-implementing hardware is present: views of those parts of the virtual world <#1569#>just outside<#1569#> the display area may be rendered in advance, so that if the participant happens to quickly move her head in that direction, then at least <#1570#>something<#1570#> (usually a low-quality rendition) is already there, and the response by the operating system need not be so critical. The relative benefits of out-of-view buffering depend to a great extent on the specific configuration of the system and the virtual worlds that it intends to portray; however, at least a <#1571#>modest<#1571#> surrounding area of out-of-view buffer is prudent on any system: as the participant rotates her head, this small buffer area can be used to consecutively load ``scrolling area'' grafts from the buffers a chunk at a time, so that, at least for modest rotation rates, the edge of the display device proper never goes highly critical.
Finally, we must consider in a fundamental way the operating system and application software itself in a system, if we wish to apply a local-update philosophy at all. Building the appropriate operating environments, and powerful applications to suit, will be an enormously complicated task---but one that will, ultimately, yield spectacular riches. And herein will lie the flesh and blood of every virtual world, whether it be large or small; sophisticated or simple; a simulation of , or a completely fictitious fantasy world. Regrettably, the author must decline any impulse to speculate further on the direction that this development will take, and will leave this task to those many experts that are infinitely more able to do so. He would, nevertheless, be most interested in visiting any such virtual worlds that may be offered for his sampling.
And thus, in conclusion, we come to one small request by the author---the ultimate goal, it may be argued, of the work presented in this : Could the software masters at ORIGIN please make a version of <#1572#>Wing Commander<#1572#>? I can never look out the side windows of my ship without fumbling my fingers and sliding head-first into the Kilrathi....
<#1573#>Ack<#1573#><#1574#>Acknowledgments<#1574#> Many helpful discussions with A. R. Petty and R. E. Behrend are gratefully acknowledged. This , and the supporting software developed to assist in this work, was written on an IBM PS/2 Model 70-386 equipped with an Intel i387 maths coprocessor, running MS-DOS 5.0 and Microsoft Windows 3.1, and employing VGA graphics. The software was developed using the Microsoft C/C++ 7.0 Compiler. The patient assistance rendered by Microsoft Australia Product Support is greatly appreciated.
This work was supported in part by an Australian Postgraduate Research Allowance, provided by the Australian Commonwealth Government.
IBM and PS/2 are registered trademarks of International Business Machines Corporation.
386 and 387 are trademarks, and Intel is a registered trademark, of Intel Corporation.
Wing Commander and ORIGIN are trademarks of ORIGIN Systems, Inc.
Windows is a trademark, and Microsoft and MS-DOS are registered trademarks, of Microsoft Corporation.
Microsoft Worlds may well be a trademark of Microsoft Corporation real soon now.
Galileo should not be trademarked by anybody.
Historical phrases used in this document that have a sexist bias syntactically, but which are commonly understood by English-speakers to refer unprejudicially to members of either sex, have not been considered by the author to be necessary of linguistic mutiliation. This approach in no way reflects the views of the University of Melbourne, its office-bearers, or the Australian Government. The University of Melbourne is an Affirmative Action <#1575#>(sic)<#1575#> and Equal Opportunity employer. Choice of gender for hypothetical participants in described thought experiments has been made arbitrarily, and may be changed globally using a search-and-replace text editor if so desired, without affecting the intent of the text in any way.
Copyright ©~1992 John P. Costella. Material in this work unintentionally encroaching on existing patents, or patents pending, will be removed on request. The remaining concepts are donated without reservation to the public domain. The author retains copyright to this document, but hereby grants permission for its duplication for research or development purposes, under the condition that it is duplicated in its entirety and unmodified in any way, apart from the abovementioned gender reassignment.
Queries or suggestions are welcome, and
should be addressed to the author; preferably
via the electronic mail address
Printed in Australia on acid-free unbleached recycled virtual paper.